Skip to content

Parallelize Esplora tx_sync confirmed/unconfirmed lookups#22

Open
martinsaposnic wants to merge 1 commit into
lsp-0.2.0_accept-underpaying-htlcs_with_timing_logsfrom
fix/parallel-esplora-tx-sync
Open

Parallelize Esplora tx_sync confirmed/unconfirmed lookups#22
martinsaposnic wants to merge 1 commit into
lsp-0.2.0_accept-underpaying-htlcs_with_timing_logsfrom
fix/parallel-esplora-tx-sync

Conversation

@martinsaposnic
Copy link
Copy Markdown

Problem

EsploraSyncClient::sync re-confirms every watched transaction and output on each pass — one or more HTTP round-trips each — strictly serially. Against a remote Esplora, sync wall-time scales with watched_set × round_trip_latency, which exceeds the LDK wallet-sync timeout on any wallet with real channel history (measured: a 4-monitor wallet = 13 watched txs + 13 watched outputs → ~21s lightning sync, blowing a 10s timeout → TxSyncTimeout).

Change

Fan out the per-item lookups with a bounded buffer_unordered, async client only (the blocking client keeps the serial path):

  • get_confirmed_transactions — the watched_transactions phase and the watched_outputs spend-status phase (status fan-out → sequential consistency post-processing → dependent-tx fan-out).
  • get_unconfirmed_transactions — the per-block get_block_status checks.

const ESPLORA_SYNC_CONCURRENCY = 4. Deliberately low: against a shared, rate-limited Esplora the effective request rate is the fleet aggregate (all nodes hit the endpoint via one ALB IP), so a small per-node bound stays under the server's per-client limit while still removing the strictly-serial latency floor.

Correctness (the part that matters for a parallel change)

  • Ordering preserved: results are still sorted by (block_height, in-block pos) before being handed to the Confirm interface — buffer_unordered's arbitrary completion order is re-sorted.
  • Consistency checks preserved: the "previously-confirmed tx now unconfirmed → Err(Inconsistency)" check runs in a sequential post-processing pass, against the fully-merged phase-A results — identical to serial.
  • Error propagation preserved: first error in the collected results aborts via ?.
  • The reorg/tip-recheck logic in sync() is untouched (outside the parallel blocks).

Test (before/after, self-contained — no bitcoind/electrs)

tests/parallel_sync_timing.rs: a mock Esplora that injects 100ms latency on the /merkleblock-proof fan-out (404 → Ok(None)), N=40 watched txs.

sync wall-time result
serial base (f56f47fe) 4133ms fails the bound
this branch (C=4) 1040ms passes

ceil(40/4) × 100ms as expected. Confirmation/ordering correctness remains covered by the electrs-backed test_esplora_syncs integration test (runs in CI).

Follow-up (not in this PR)

Consuming this requires bumping the rust-lightning rev pin in ldk-node (separate PR, also raises LDK_WALLET_SYNC_TIMEOUT_SECS 10→30), then a lightning-js release. Should land after the Esplora server-side rate-limit raise, else higher concurrency just trips 429-backoff.

EsploraSyncClient::sync re-confirms every watched transaction and output on
each pass, one or more HTTP round-trips each, issued strictly serially. Against
a remote Esplora the sync wall-time scales with watched_set * round_trip_latency
and exceeds a wallet-sync timeout on any wallet with real channel history.

Fan out the per-tx / per-output lookups in get_confirmed_transactions (both the
watched_transactions phase and the watched_outputs spend-status phase) and the
per-block checks in get_unconfirmed_transactions with a bounded
buffer_unordered(4), async client only; the blocking client keeps the serial
path. Ordering is preserved (results are still sorted by block height then
in-block position before being handed to Confirm), and every consistency check
the serial version performed is preserved in a sequential post-processing pass.

Concurrency is deliberately low: against a shared, rate-limited Esplora the
effective request rate is the fleet aggregate, so a small per-node bound stays
under the server's per-client limit while removing the serial latency floor.

Adds a self-contained timing test (mock Esplora with injected per-request
latency, no bitcoind/electrs): N=40 watched txs at 100ms/lookup sync in ~1.0s
parallel vs ~4.1s serial. Confirmation/ordering correctness remains covered by
the electrs-backed test_esplora_syncs integration test.
@martinsaposnic martinsaposnic requested a review from amackillop June 1, 2026 22:07
.await;
for r in results {
if let Some(confirmed_tx) = r? {
if !confirmed_txs.iter().any(|ctx| ctx.txid == confirmed_tx.txid) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If watched transactions is a set, is it necessary to guard against inserting the same tx twice here? I see it was there before so probably best to keep it just looks strange at first glance.

Comment on lines +370 to +395
// B2: post-process sequentially, preserving every consistency check
// the sequential version performed, and build the to-fetch list.
let mut to_fetch: Vec<(Txid, Option<BlockHash>, Option<u32>)> = Vec::new();
for status_res in status_results {
let output_status = match status_res? {
Some(s) => s,
None => continue,
};
if let Some(spending_txid) = output_status.txid {
if let Some(spending_tx_status) = output_status.status {
if confirmed_txs.iter().any(|ctx| ctx.txid == spending_txid) {
if spending_tx_status.confirmed {
continue;
} else {
log_trace!(self.logger, "Inconsistency: Detected previously-confirmed Tx {} as unconfirmed", spending_txid);
return Err(InternalError::Inconsistency);
}
}
to_fetch.push((
spending_txid,
spending_tx_status.block_hash,
spending_tx_status.block_height,
));
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// B2: post-process sequentially, preserving every consistency check
// the sequential version performed, and build the to-fetch list.
let mut to_fetch: Vec<(Txid, Option<BlockHash>, Option<u32>)> = Vec::new();
for status_res in status_results {
let output_status = match status_res? {
Some(s) => s,
None => continue,
};
if let Some(spending_txid) = output_status.txid {
if let Some(spending_tx_status) = output_status.status {
if confirmed_txs.iter().any(|ctx| ctx.txid == spending_txid) {
if spending_tx_status.confirmed {
continue;
} else {
log_trace!(self.logger, "Inconsistency: Detected previously-confirmed Tx {} as unconfirmed", spending_txid);
return Err(InternalError::Inconsistency);
}
}
to_fetch.push((
spending_txid,
spending_tx_status.block_hash,
spending_tx_status.block_height,
));
}
}
}
// B2: post-process sequentially, preserving every consistency check
// the sequential version performed, and build the to-fetch list.
let phase_a_txids: HashSet<Txid> =
confirmed_txs.iter().map(|ctx| ctx.txid).collect();
let mut to_fetch: Vec<(Txid, Option<BlockHash>, Option<u32>)> = Vec::new();
// `transpose` drops outputs with no status while still surfacing any
// lookup error through the `?` below.
for status_res in status_results.into_iter().filter_map(Result::transpose) {
let output_status = status_res?;
let (Some(spending_txid), Some(spending_tx_status)) =
(output_status.txid, output_status.status)
else {
continue;
};
if phase_a_txids.contains(&spending_txid) {
// Phase A already resolved this spend as confirmed; the server
// flipping it back to unconfirmed is an inconsistency.
if !spending_tx_status.confirmed {
log_trace!(self.logger, "Inconsistency: Detected previously-confirmed Tx {} as unconfirmed", spending_txid);
return Err(InternalError::Inconsistency);
}
continue;
}
to_fetch.push((
spending_txid,
spending_tx_status.block_hash,
spending_tx_status.block_height,
));
}

nit: readability improvement

Copy link
Copy Markdown

@amackillop amackillop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good suggested a refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants