Only splice when needed#14
Open
amackillop wants to merge 2 commits intoaustin_defer-usability-checkfrom
Open
Conversation
htlc_intercepted only persisted HTLCs when a liquidity action (splice/open) was needed. This prevents calculate_htlc_actions from returning empty actions (i.e. "do nothing, let the timer handle it") because the HTLC would never make it into the store and the timer would never see it. Move the insert before calculate_htlc_actions_for_peer so every intercepted HTLC is in the store before we decide what to do with it. execute_htlc_actions already removes the HTLC on successful forward, so the store entry is short-lived on the happy path. The cost is two extra KV store round-trips (S3 write + delete) per HTLC on the forward-only path, where previously there were none. This hits every payment since LSPS4 intercepts all of them. Unfortunately unavoidable: the insert must happen before action calculation because we don't know yet whether the result will be "forward now" or "do nothing and defer to the timer." If we wait until after and the result is "do nothing," we've already lost the HTLC. Replace the unconditional unwrap on insert with explicit error handling. On persistence failure: - Forward-only: proceed. If the forward succeeds the store entry was redundant. If persistence failed and the channel becomes unusable between calculate and execute, the HTLC is orphaned (not in store for timer retry, not forwarded) until LDK's CLTV timeout cleans it up. Requires both S3 failure and a channel state change between two adjacent calls. - Liquidity action needed: fail the HTLC back to sender. The splice/open is async and the timer needs the stored HTLC to forward once the channel is ready. Without it, we'd spend on-chain fees on a splice that never results in a forward.
When the LSP intercepts an HTLC for an offline peer, the peer reconnects and peer_connected fires before channel_reestablish completes. All channels report is_usable=false at this point. calculate_htlc_actions_for_peer builds its capacity map from is_usable channels (empty set), finds zero capacity, then falls through to the splice candidate filter. That filter used is_channel_ready, which matches mid-reestablish channels, so it emits a splice. Once reestablish finishes, the HTLC gets forwarded through the existing channel. The splice was wasted, and the 30s liquidity cooldown it sets blocks all subsequent liquidity actions for that peer. Two fixes in calculate_htlc_actions_for_peer: 1. Early return when channels exist but none are usable. The capacity map is empty so any decision would be wrong. The HTLC stays in the store (persisted by the prior commit) and the 1Hz timer retries once channels become usable. 2. Splice candidate filter changed from is_channel_ready to is_usable. A mid-reestablish channel maye already have sufficient capacity but it's just not visible yet. Splicing into it adds unnecessary on-chain cost and a 30s cooldown for capacity that was never actually insufficient. splice_channel() would also fail if the channel does become usable in time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Every time a peer reconnects with pending HTLCs, the LSP triggers a splice that turns out to be unnecessary.
The sequence: LSP intercepts HTLC for offline peer, sends webhook, peer reconnects.
peer_connectedfires beforechannel_reestablishcompletes, so all channels areis_usable=false.calculate_htlc_actions_for_peerbuilds its capacity map from usable channels (empty), finds zero capacity, falls through to the splice filter which matches onis_channel_ready(finds the mid-reestablish channel), and emits a splice. ~7 seconds later the splice completes. Meanwhile reestablish finished in ~100-500ms and the HTLC was forwarded through the existing channel that had plenty of capacity all along.The wasted splice also sets a 30s liquidity cooldown, blocking all subsequent liquidity actions for that peer.
Fix
Two commits, split because the first is a prerequisite for the second.
1. Persist HTLCs before action calculation
htlc_interceptedonly persisted HTLCs when a liquidity action was needed. This preventscalculate_htlc_actions_for_peerfrom ever returning "do nothing" (empty actions) because the HTLC would be lost — not in the store for the timer, not forwarded.Move the insert before action calculation. The cost is two extra S3 round-trips (write + delete) per HTLC on the forward path, where previously there were none. This hits every payment. Unavoidable because we don't know the action result before calculating it.
Also replaces the
unwrap()on insert with explicit error handling: best-effort for forwards, fail-back for liquidity actions.2. Prevent spurious splice on reconnect
Two changes in
calculate_htlc_actions_for_peer:Early return when channels exist but none are usable. The capacity map is empty so any decision (splice, open, forward) would be wrong. The HTLC stays in the store and the 1Hz timer retries once reestablish completes.
Splice candidate filter changed from
is_channel_readytois_usable. A mid-reestablish channel may already have sufficient capacity that just isn't visible yet. Splicing adds unnecessary on-chain cost for capacity that was never actually insufficient.Verification
cargo test -p lightning-liquidity --lib— 65/65 pass