L2 Support Engineer · Fintech · Week 6
Week 6 Day 5
Week 6 · Day 5

Callback Issue RCA

A callback is a reply that travels back to the sender after a transaction is processed. When it fails, goes to the wrong place, or arrives with mismatched data — the transaction appears stuck even though it completed. Today you learn to find and fix exactly that.

Retry Logic Status Mismatch Callback Failure Dead Letter Reprocess
01 The Simple Idea First
Real-life Analogy

Think of a callback like a courier delivering a package and getting a signed receipt back. The courier delivers the parcel (transaction processed). The receiver signs the receipt (callback sent). The signature travels back to the sender (callback delivered).

Now imagine the signature gets lost on the way back. The sender has no receipt. They do not know if the package arrived. They mark it as undelivered — even though the customer has their parcel.

A callback failure is exactly that. The transaction completed at SBP or the destination bank. But the confirmation never made it back to your system. Your system marks it PENDING. The client says "payment failed." The money already moved.

What is a Callback?

After your system sends a transaction to SBP or a downstream service, that service sends back a callback — a notification saying "here is the result of what you sent me." It travels back to a specific endpoint (URL) on your system.

If the callback is received correctly — the transaction status is updated and the client is notified. If it fails to reach your system — the transaction stays PENDING forever, even though the money has already moved.

This is why callback failures are dangerous. The status in your system says FAILED or PENDING. The client's bank statement shows the money gone. Those two things do not match — and that is the mismatch you need to resolve.

02 How a Callback Should Work — Happy Path

📨 Normal Callback Flow — Everything Working Correctly

🏦
Step 1 — Your System
Transaction sent to SBP or downstream bank
Your system sends the payment request and immediately records the transaction as PENDING in the database. It now waits for the callback reply.
DB: TXN-501 → STATUS = PENDING
🏛️
Step 2 — SBP / Destination
SBP processes the transaction and fires a callback
SBP settles the payment. It then sends a callback to your system's registered endpoint — a POST request containing the transaction result (ACSP or RJCT) and the reference ID.
POST https://yourapp.com/callback → {txn_ref: TXN-501, status: ACSP}
🔌
Step 3 — Your System Receives
Callback received — status updated in DB
Your callback handler receives the POST, validates the reference ID, and updates the transaction status from PENDING to SUCCESS or FAILED. Client is notified.
DB: TXN-501 → STATUS = SUCCESS | Client notified ✅
📬
Step 4 — Notification Delivered
Client receives confirmation
The notification service sends the final status to the client's system or user's phone. Everything matches. Transaction is closed.
03 Where Callbacks Break — The 4 Failure Types
Delivery Failures

Callback never arrives

  • Your endpoint URL is wrong or outdated
  • Your server is down when SBP sends it
  • Firewall blocking SBP's IP address
  • SSL/TLS certificate error on your endpoint
  • Network issue between SBP and your server
Data Mismatches

Callback arrives but data is wrong

  • TXN reference ID in callback does not match DB
  • Amount in callback differs from original request
  • Status code format not what system expected
  • Duplicate callback sent — already processed
  • Callback for wrong client or wrong environment
04 Retry Logic — How the System Handles Failed Callbacks

What is Retry Logic?

When a callback delivery fails (your endpoint was unavailable), most systems automatically retry sending it after a delay. The retry schedule typically looks like: first retry after 1 minute, second retry after 5 minutes, third retry after 30 minutes, fourth retry after 2 hours.

If all retries are exhausted and the callback still cannot be delivered — the callback is moved to a dead letter queue. This is where it sits, waiting for manual investigation and reprocessing.

🔄 Retry States — What Each One Means
StateWhat it meansL2 Action
PENDINGCallback not yet received. Still within timeout window.Monitor — give it time. Check again after retry interval.
RETRYINGFirst delivery failed. System is automatically retrying.Watch — check if endpoint is back up. Retries will continue.
MAX_RETRY_EXCEEDEDAll retry attempts failed. No more automatic retries.Act now — fix the endpoint issue then manually trigger reprocess.
DEAD_LETTERCallback moved to dead letter queue after all retries failed.Investigate — find root cause of delivery failure then reprocess.
DUPLICATECallback arrived but this TXN was already processed.Ignore safely — system should reject duplicates. Verify it did.
MISMATCHCallback arrived but data does not match original transaction.Escalate — reconciliation team must manually verify and resolve.
05 Status Mismatch — The Most Critical Scenario

What is a Status Mismatch?

A status mismatch is when the status in your system does not match the actual outcome at SBP. This is the most dangerous callback issue because it involves real money.

Example 1 — Money moved, system shows FAILED: SBP processed and settled the payment. Your system never received the callback. Your system shows TXN FAILED. The customer's account was debited. This is a reconciliation emergency — money is gone from the customer but your system says it didn't happen.

Example 2 — Money not moved, system shows SUCCESS: Your system received a callback but the reference ID was wrong — it accidentally updated a different transaction. Your system shows SUCCESS for a transaction that SBP actually rejected.

🚨 Rule: Any mismatch where money has moved but the system does not reflect it — or vice versa — is an immediate P1. Escalate to L3, reconciliation team, and your manager simultaneously. Do not wait.
06 DB Tables to Check During Callback Investigation
🗄️ Which Table Tells You What
TableWhat to checkWhat it reveals
TRANSACTIONS_LOGTXN_REF, STATUS, UPDATED_ATCurrent status of the transaction in your system
MX_MESSAGEDIRECTION=INBOUND, MESSAGE_TYPE=PACS002Whether SBP's callback was actually received
TRANSACTIONS_LOG_REQ_RESPRESPONSE_BODY for the TXNWhat the callback message actually said
CALLBACK_RETRY_LOGRETRY_COUNT, STATUS, LAST_RETRY_ATHow many retries were attempted and when
DEAD_LETTER_QUEUETXN_REF, FAILURE_REASONCallbacks that exhausted all retries
07 Hands-on Lab — Investigate a Callback Issue

Scenario for this lab

Client reports: "We sent a payment 2 hours ago. Our customer's account is debited but our system shows the transaction as PENDING." This is a classic callback failure — SBP processed it, the money moved, but your system never received the confirmation back.

🔬 Lab: Trace and Resolve a Callback Failure

Log + SQL Investigation
01
Create the simulated callback failure log
This log shows a transaction that was sent to SBP but the callback never arrived back.
terminal
cat > ~/callback-issue.log << 'EOF'
[10:00:00] [INFO ] TXN-501 received from client. Amount: 15000
[10:00:01] [INFO ] TXN-501 validated. Sending to SBP via PACS.008
[10:00:02] [INFO ] TXN-501 sent to SBP. Status set to PENDING.
[10:00:05] [INFO ] TXN-502 received from client. Amount: 8000
[10:00:06] [INFO ] TXN-502 sent to SBP. Status set to PENDING.
[10:05:00] [WARN ] No callback received for TXN-501. Elapsed: 5 minutes.
[10:10:00] [WARN ] No callback received for TXN-501. Retry attempt 1.
[10:10:00] [INFO ] Callback received for TXN-502. Status: ACSP. Updated.
[10:15:00] [WARN ] No callback received for TXN-501. Retry attempt 2.
[10:30:00] [ERROR] Callback delivery failed for TXN-501. Retry attempt 3.
[10:30:00] [ERROR] CALLBACK_ENDPOINT_UNREACHABLE: Connection refused.
[11:00:00] [ERROR] TXN-501 moved to DEAD_LETTER after 4 failed retries.
[12:00:00] [INFO ] TXN-501 still PENDING in DB. No update received.
EOF
→ Log created. TXN-501 stuck in PENDING. TXN-502 processed fine. Callback endpoint was unreachable.
02
Step 1 — Confirm the transaction is stuck and when it started
Read the log to understand the full timeline.
terminal
# Full history of TXN-501
grep "TXN-501" ~/callback-issue.log

# How many retries happened?
grep "Retry" ~/callback-issue.log

# What was the error on the last attempt?
grep "CALLBACK\|DEAD_LETTER" ~/callback-issue.log
→ TXN-501 sent at 10:00:02. 4 retries between 10:10 and 11:00. Moved to DEAD_LETTER at 11:00. Error: CALLBACK_ENDPOINT_UNREACHABLE.
03
Step 2 — Check the current status in DB (SQLite simulation)
What does your system currently say about TXN-501?
terminal — SQLite
sqlite3 ~/fintech_lab.db << 'EOF'
-- Check current status of all PENDING transactions
SELECT txn_id, status, amount
FROM transactions
WHERE status = 'PENDING';

-- Check which transactions completed (SUCCESS)
SELECT txn_id, status, amount
FROM transactions
WHERE status = 'SUCCESS';
EOF
→ Confirms PENDING transactions exist in DB. In a real system TXN-501 would be here — status PENDING, money already debited by SBP.
04
Step 3 — Identify the root cause of the callback failure
Why did the callback fail? The log tells you: CALLBACK_ENDPOINT_UNREACHABLE.
terminal — confirm root cause
# Find the exact error
grep "ERROR" ~/callback-issue.log

# Compare TXN-501 (failed) with TXN-502 (success)
grep "TXN-502" ~/callback-issue.log

# When did TXN-502 succeed vs TXN-501 fail?
grep "TXN-501\|TXN-502" ~/callback-issue.log | grep "callback\|Callback"
→ TXN-502 callback received at 10:10. TXN-501 never received. Endpoint was unreachable specifically during TXN-501's callback window — intermittent outage.
05
Step 4 — Verify at SBP level if TXN-501 actually succeeded
Before updating anything, you must confirm what SBP actually did. Check MX_MESSAGE for the SBP response — or raise a PACS.028 status enquiry to SBP directly.
SQL — check what SBP replied (real system)
-- Check if SBP sent a PACS.002 for TXN-501
SELECT MESSAGE_TYPE, DIRECTION, MESSAGE_BODY, CREATED_AT
FROM MX_MESSAGE
WHERE TXN_REF = 'TXN-501'
AND DIRECTION = 'INBOUND';

-- If no INBOUND record: SBP reply never arrived
-- If INBOUND exists: reply arrived but was not processed

-- Check callback retry log for TXN-501
SELECT TXN_REF, RETRY_COUNT, FAILURE_REASON, LAST_RETRY_AT
FROM CALLBACK_RETRY_LOG
WHERE TXN_REF = 'TXN-501';
⚠️ Two outcomes possible: If MX_MESSAGE has an INBOUND PACS.002 → SBP did reply, your system failed to process it. If no INBOUND record → SBP reply never reached your system. These are two completely different fixes.
06
Step 5 — Resolve the mismatch
Once you have confirmed SBP's actual result, follow the correct resolution path.
Resolution decision tree
IF SBP result = SUCCESS and your DB = PENDING:
→ Manually update status to SUCCESS (with L3 approval)
→ Trigger notification to client
→ Log the manual override in Jira

IF SBP result = FAILED and your DB = PENDING:
→ Update status to FAILED
→ Trigger refund process if money was taken
→ Notify client of failure and reason

IF SBP has no record of TXN-501:
→ Transaction never reached SBP
→ Escalate to L3 — possible to resubmit
→ Check PACS.008 outbound log

IF callback is in DEAD_LETTER:
→ Fix the endpoint issue first
→ Then manually trigger reprocess of dead letter
→ Resolution path identified. Always fix root cause before reprocessing. Never manually update status without verifying SBP's actual result first. ✅
07
Write the Jira RCA
Document everything clearly — what happened, what the mismatch was, what was done to resolve it.
Jira RCA — Callback Failure
Incident : Callback Failure — TXN-501 stuck in PENDING
Reported : Client — payment debited but system shows PENDING
Time : 10:00:02 (sent) — 11:00 (dead letter)
Root Cause : Callback endpoint was unreachable during retry window.
Error: CALLBACK_ENDPOINT_UNREACHABLE (4 retries failed)
Mismatch : SBP confirmed SUCCESS. DB shows PENDING.
Money debited from customer. Confirmation not received.
Resolution : Endpoint restored. Dead letter entry reprocessed.
TXN-501 status manually updated to SUCCESS (L3 approval).
Client notification sent. Reconciliation confirmed.
Next Steps : Investigate why endpoint was unreachable between
10:00 and 11:00. Add endpoint health monitoring.
→ Professional, complete RCA. Paste directly into Jira. ✅
08 Real L2 Scenarios
01

Client says: "Customer paid but transaction shows failed in our app." You check MX_MESSAGE — SBP PACS.002 with ACSP is there (INBOUND). Your DB shows PENDING. Callback arrived but was not processed. Your callback handler had a bug. L3 to fix handler + manually update the status to SUCCESS.

02

You find 50 transactions in the dead letter queue — all from the same 30-minute window. Your callback endpoint was down during maintenance. Once the endpoint is restored — you reprocess all 50 from the dead letter queue. They all update correctly within minutes.

03

Callback arrived for TXN-601 but the TXN_REF in the callback body says TXN-600. Mismatch on the reference ID. Your system ignored it because TXN-600 was already SUCCESS. TXN-601 stays PENDING. This is a data mismatch — needs reconciliation team to manually match and update.

04

Duplicate callback arrives for TXN-501 which was already processed. Your system must reject duplicates safely. If it processed both — it might double-credit the account. You check the callback handler logic — it should check if STATUS is already SUCCESS before processing. If it does not — this is a code bug for L3.

✅ Week 6 · Day 5 Outcomes