The first time I watched a founder handle a failed payment live — in an Ahmedabad office, a webhook alert lighting up, a scramble in Slack, someone drafting a message by hand — it looked like panic. A failed payment shouldn't create panic. It should create a workflow.
That sounds obvious, yet many early teams still handle failures as scattered events. A webhook arrives. Someone gets an alert. The gateway retries. A support person maybe sends a message. At month end the founder checks MRR and wonders where the gap came from. Recovery improves the moment it stops being a set of reminders and becomes a stack.
Indian SaaS needs the same discipline as the best Western retention systems, but wired to different rails: Razorpay and Cashfree events, UPI AutoPay, card mandates, eNACH, the RBI 2026 framework, and WhatsApp as a first-class recovery channel.
Six layers
Layer | Purpose | Example |
|---|---|---|
Event capture | Know immediately what happened | Failed-payment webhook from the gateway |
Classification | Understand the failure state | Soft decline, mandate broken, AFA needed, revoked, expired, infrastructure, unknown |
Retry decision | Attempt recovery where it makes sense | Timed retry for a soft failure — and not on a queued peak-window UPI event |
Customer reach | The right message in the right channel | Brand-first WhatsApp for action, email for the record |
Resolution path | A way for the customer to actually fix it | Payment link, mandate re-auth, plan change, support hand-off |
Learning loop | Improve the next recovery | Track recovered MRR, unresolved MRR, and reason patterns |
Most teams have layer one and part of layer three. They know an event happened and the gateway may retry. The value is usually missing in two, four, five, and six — diagnosis, reach, resolution, and learning.
The part most teams skip: suppression rules
A real stack doesn't just send; it knows when not to. These checks should run before every single message:
Confirm the failure is real. A UPI AutoPay debit "failing" inside NPCI's peak windows (10:00–13:00, 17:00–21:30 IST) may be queued, not failed. Don't message yet.
Suppress during downtime. If the gateway reports method/issuer downtime, queue everything until it resolves — messaging customers about a failure that's actually an outage destroys trust.
Respect quiet hours. Hold non-urgent sends outside roughly 9am–9pm IST.
Cap the volume. A hard ceiling per failure event, and a cross-customer daily cap, so no one feels chased.
Stop on recovery. The instant the gateway reports success, cancel every queued message and send one confirmation.
That last point matters more than it looks: the recovery-confirmation message ("your plan with [Brand] is fully active again, nothing more needed") legitimises the whole sequence and cuts spam reports. With the 2026 framework now requiring banks to send a post-debit notification, a clean merchant confirmation also helps the customer reconcile the charge.
The stack behaves differently per failure
Failure state | Stack response |
|---|---|
Soft / insufficient funds | Retry on a sensible window, then notify if unresolved |
Mandate inactive | Ask the customer to re-authorise (fresh AFA) |
AFA required (≥ ₹15,000) | Explain the approval step before the next attempt |
UPI AutoPay issue | UPI-aware message + alternate link — after confirming it actually failed |
Customer revoked mandate | Move to a retention conversation, not a payment chase |
Repeated non-response | Escalate to support or mark likely churn |
High-value account | Human review before suspension |
This is where voluntary and involuntary churn meet. A revoked mandate might be an exit or just auto-debit nerves. Three missed messages might mean "busy" or "gone." A failed payment might be a payment issue or the first visible sign of a product issue. The stack shouldn't pretend to know — it should create enough context for the next action to be precise and respectful.
Start with one table
The first version needn't be complex. A single internal table changes the conversation:
Customer | Plan | Gateway | Method | Failure reason | Channel | Outcome | Time-to-recover |
|---|---|---|---|---|---|---|---|
A | ₹4,999/mo | Razorpay | UPI AutoPay | Mandate re-auth needed | Recovered | 2 days | |
B | ₹14,999/mo | Cashfree | Card mandate | Soft / retryable | Email + WhatsApp | Pending | — |
C | ₹24,999/mo | Razorpay | Card | AFA required (≥ ₹15,000) | Human note | Recovered | 1 day |
With a table like this, the team stops saying "payments failed" and starts saying "UPI mandate re-auth is most of our unresolved MRR this week," or "WhatsApp links recover smaller plans faster than email alone," or "plans over ₹15,000 need a pre-renewal authentication nudge." Those are decisions, not vibes.
Where SubsShield fits
SubsShield is built around exactly this workflow. It captures the Razorpay or Cashfree event, classifies the failure, makes the retry/suppress decision, reaches the customer brand-first on WhatsApp and email, hands them a working path, and closes the loop with recovered-vs-saved reporting. The point isn't to add noise after a failure — it's to convert a failure event into the right next action before it hardens into churn.
If a payment failed five minutes ago, does your company already know the next best action — or is the customer now sitting inside a generic retry queue?
References
Razorpay Subscriptions state machine & webhooks — https://razorpay.com/docs/payments/subscriptions/
Cashfree Subscriptions webhooks — https://docs.cashfree.com/
NPCI UPI AutoPay execution-window circular UPI-OC-No-215-A-FY-2025-26 (eff. 1 Aug 2025)
RBI, Digital Payments – E-mandate Framework (2026)


