How Attackers Drain Your Cloud Budget
You add a paid feature to your product. It might summarize text, send a verification code, transcode a video, or OCR a document. The HTTP request looks ordinary, but the route fans out into model tokens, SMS charges, or minutes of worker time, and a weekend of unexpected traffic turns into a bill you did not plan for.
What makes these endpoints dangerous is that the expensive part usually sits behind your controller. The request is cheap to send, but costly to serve. That gap creates its own attack surface, which is the same framing that matters in Threat Modelling for Builders. The rest of this post walks through five ways cost-triggering features get abused in practice, and the controls that keep each one bounded.
Directly expensive routes
Some endpoints are expensive one request at a time. LLM summarization, image generation, transcription, and geocoding all fit this pattern. The attacker does not need an exploit in the usual sense. They only need a cheap way to keep reaching the paid operation.
1 | import concurrent.futures |
If the downstream model call costs one cent, 2,000 requests cost you $20, and the real problem is that a cheap script can keep running, can spread across accounts, and can keep charging your account while you’re asleep.
The fix belongs at the endpoint boundary. Reject oversized input first, rate-limit bursts, consume quota before the paid call, and emit cost telemetry early enough that cloud or application alerts fire before one bad hour becomes an all-weekend incident.
1 | def summarize(request, account): |
Authentication helps, but it does not solve this by itself. Free accounts, trial accounts, and stolen credentials all reach the same paid code path. The control that matters is the allowance attached to the feature.
Free-tier farming
Free tiers create a different problem from burst abuse. The attacker does not need high request volume and does not need to bypass authentication. They create an account, use the expensive feature up to the allowance, discard the account, and repeat.
Rate limits do not solve that pattern because the attacker can stay well under them. What matters here is a hard allowance on the paid feature, especially for new accounts.
1 | def summarization_limit_for(account): |
The practical rule is to start small and raise the ceiling only when the account has earned more trust. A payment method, a history of normal product usage, or a plan upgrade are all stronger signals than a low-friction signup step. Rate limits still matter, but free-tier abuse is mostly an allowance problem.
Messaging fraud
Verification and notification routes look harmless because the payload is tiny. A phone number goes in, a code goes out, and the request finishes quickly. Behind that route, though, each request can trigger a telecom charge. Twilio publishes SMS pricing by region (Twilio SMS pricing), and Lime describes cutting SMS pumping costs after mitigation (Twilio customer story: Lime).
The attack pattern is usually called SMS pumping: automated traffic repeatedly triggers paid messages, often against new accounts and verification flows. If your product only serves a narrow geography or does not need repeated sends right after signup, those constraints should exist in code.
1 | ALLOWED_COUNTRIES = {"US", "CA"} |
This route stays simple on purpose. It does not try to predict fraud perfectly. It limits what a new account can spend, restricts regions if the product only operates in a few of them, and forces repeated sends to stop before the telecom bill grows.
Replays and duplicate jobs
Cost-triggering work often starts at a webhook or in a queue instead of a public product endpoint. A payment event arrives, your system generates an invoice PDF, emails a customer, and writes records. If the event can be replayed, or if the job retries without idempotency, the same paid work happens more than once. For the broader webhook boundary, sender verification, and replay problem, see A Practical Security Audit for Builders.
Stripe recommends verifying webhook signatures before you trust the event (Stripe webhook signatures). That is only part of the fix. You also need an idempotency check before any expensive side effect starts.
1 | def handle_webhook(request): |
The same rule applies in workers. Deduplicate jobs, cap retries, and stop treating every retry as permission to repeat paid side effects.
1 | def process_invoice_job(job): |
Without those checks, a retry storm or replayed event stops being only an operational problem and starts spending money.
Small inputs that become large jobs
Some routes are cheap on the wire but expensive after parsing. A scanned PDF can trigger OCR on hundreds of pages, a short uploaded video can fan out into multiple renditions and thumbnails, and a user request that looks small at the edge can turn into a long worker job after it enters your pipeline. This is the same pattern as a zip bomb in upload security, but expressed as spend instead of memory pressure, so the thing to budget is the expanded work, not just the upload size. The upload version of that failure mode is covered in How to Not Get Hacked Through File Uploads.
1 | def convert_pdf(request, account): |
Size limits still matter, but they are not enough here. The right question is how much downstream work the file creates after acceptance, because that is what your workers and vendors will bill you for.
Track spend before billing does
Billing alerts are useful, but they are late. By the time the cloud invoice moves, the abuse has already happened. You need feature-level cost telemetry attached to the actor that caused it: account, tenant, API key, or webhook source. AWS Budgets supports automated budget actions (AWS budget actions), and GCP documents programmatic budget notifications (GCP budget notifications). Those are good backstops, but the application should see the problem earlier.
1 | {"account_id":"acct_142","plan":"free","feature":"summarize","tokens":4100,"estimated_cost_usd":0.012,"ts":"2026-03-20T10:14:00Z"} |
Those fields are enough to drive action. Page the team when a new free account burns through its allowance immediately. Alert when one verification flow starts spending outside its normal range. Surface worker queues where one tenant suddenly owns most of the paid jobs.
Putting it together
Cost-triggering endpoints fail in a few repeatable ways. Some are expensive on every call. Some get farmed through free tiers. Some hide telecom spend behind tiny requests. Some replay the same paid side effect, and some turn small uploads into large jobs. The controls are repetitive in a good way: bound the work before you do it, cap how much one actor can spend, deduplicate retries, and alert early when feature spend leaves its normal range.
Cost defense checklist:
- Paid routes identified explicitly in code or config
- Input size or job size checked before expensive work starts
- Per-account quotas enforced on every paid feature
- Rate limits used for bursts, quotas used for total allowance
- New and free accounts get much smaller paid-feature limits
- Messaging routes restricted to the regions and volumes the product actually needs
- Webhook handlers verify signatures and enforce idempotency before paid side effects
- Workers deduplicate jobs and cap retries
- Feature-level spend telemetry recorded per actor
- Provider-side budget alerts and budget backstops configured for expensive features
For the broader context on how cost boundaries fit into a security audit, see A Practical Security Audit for Builders. For understanding who is likely attacking your system and what they are after, see Threat Modelling for Builders.