Add interop integration test harness for LND, CLN, and Eclair#839
Add interop integration test harness for LND, CLN, and Eclair#839febyeji wants to merge 5 commits intolightningdevkit:mainfrom
Conversation
|
🎉 This PR is now ready for review! |
8c1e94f to
a50c9d0
Compare
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
a50c9d0 to
433515c
Compare
tnull
left a comment
There was a problem hiding this comment.
Thanks, this already looks great on the first pass! I think we should get some minor things out of the way and then move forward with this so we can take advantage of the additional test coverage for our upcoming release already.
Some comments def. can be addressed in a follow-up (e.g. adding BOLT12 test coverage would be great).
| /// Upstream LDK fix needed: skip payment_secret verification when a valid | ||
| /// `keysend_preimage` TLV is present. | ||
| #[tokio::test(flavor = "multi_thread", worker_threads = 1)] | ||
| #[ignore = "CLN v24.08+ sends payment_secret in keysend — LDK rejects (upstream fix needed)"] |
There was a problem hiding this comment.
Have yet to explore this further.
…setup Address review feedback on PR lightningdevkit#839: - Replace reqwest dependency with bitreq for Eclair REST client - Replace curl shell calls with bitreq async HTTP requests - Remove per-test docker container recreation (reuse single Eclair instance, unlock UTXOs between tests instead) - Fix chmod -R 755 to u+rwX,go+rX in CLN/LND CI workflows - Add --fail flag to curl readiness check in Eclair CI
433515c to
2613a3f
Compare
- Shared scenarios in tests/common/scenarios/ generic over ExternalNode - Test entry points for LND, CLN, and Eclair - Combo orchestrator (combo.rs) with interop_combo_tests! macro generating 16 tests per implementation (phase × disconnect side × close type × initiator) - Building blocks use no test_ prefix; full scenarios include setup internally - #[ignore] annotations for known interop failures: CLN keysend (payment_secret issue), Eclair keysend (InvalidOnionPayload), CLN/Eclair splice (version requirements)
- Add docker-compose configs for CLN (bitcoind 29.1), LND (bitcoind 29.1), and Eclair (bitcoind 30.2 required by Eclair latest) - Add CI workflows: cln-integration.yml, lnd-integration.yml, eclair-integration.yml - Bump corepc-node feature to 29_0 and update download_bitcoind_electrs.sh
…setup Address review feedback on PR lightningdevkit#839: - Replace reqwest dependency with bitreq for Eclair REST client - Replace curl shell calls with bitreq async HTTP requests - Remove per-test docker container recreation (reuse single Eclair instance, unlock UTXOs between tests instead) - Fix chmod -R 755 to u+rwX,go+rX in CLN/LND CI workflows - Add --fail flag to curl readiness check in Eclair CI
2613a3f to
9c2fbdc
Compare
…setup Address review feedback on PR lightningdevkit#839: - Replace reqwest dependency with bitreq for Eclair REST client - Replace curl shell calls with bitreq async HTTP requests - Remove per-test docker container recreation (reuse single Eclair instance, unlock UTXOs between tests instead) - Fix chmod -R 755 to u+rwX,go+rX in CLN/LND CI workflows - Add --fail flag to curl readiness check in Eclair CI
9c2fbdc to
7420893
Compare
|
Thanks for the review! Pushed an update addressing the feedback. Will address in follow-ups:
|
Thanks! Mind opening tracking issues for these so we don't forget? |
tnull
left a comment
There was a problem hiding this comment.
CI is failing right now:
thread 'test_eclair_cooperative_close_after_fee_change' (16127) panicked at tests/common/scenarios/channel.rs:275:55:
called `Result::unwrap()` on an `Err` value: ChannelClosingFailed
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test_eclair_cooperative_close_after_fee_change
…ange test Add a 2s delay after the post-fee-change payment to let LDK's internal monitor updates settle, and use the cooperative_close_by_ldk helper.
|
@tnull Fixed the |
tnull
left a comment
There was a problem hiding this comment.
Thanks for addressing the feedback! Here are some more comments.
What stands out to me is that:
- This PR/the code is very verbose. It would be good to simplify and DRY up where possible to improve maintainability going forward.
- We currently ignore a lot of test cases, often for reasons that aren't conclusive to me (or seem outdated). Can we double-check whether we really can't make them work?
| sudo apt-get install -y socat | ||
| docker compose -f docker-compose-cln.yml exec bitcoin bitcoin-cli -regtest -rpcuser=user -rpcpassword=pass createwallet miner | ||
| ADDR=$(docker compose -f docker-compose-cln.yml exec bitcoin bitcoin-cli -regtest -rpcuser=user -rpcpassword=pass -rpcwallet=miner getnewaddress) | ||
| docker compose -f docker-compose-cln.yml exec bitcoin bitcoin-cli -regtest -rpcuser=user -rpcpassword=pass generatetoaddress 1 "$ADDR" |
There was a problem hiding this comment.
What are we using the miner wallet for? Note that coinbase funds are only maturing after 100 blocks, so they can only be spent then (which is why you usually see generatetoaddress 101 on setups like this).
| with: | ||
| path: bin/bitcoind-${{ runner.os }}-${{ runner.arch }} | ||
| key: bitcoind-${{ runner.os }}-${{ runner.arch }} | ||
| key: bitcoind-29.0-${{ runner.os }}-${{ runner.arch }} |
There was a problem hiding this comment.
It's a bit odd to only adjust the cache key but not the path here. What's the rationale for this?
| node.next_event_async(), | ||
| ) | ||
| .await | ||
| .unwrap_or_else(|_| panic!("{} timed out waiting for ChannelClosed event", node.node_id())); |
| .try_into() | ||
| .map_err(|_| self.make_error(format!("capacity_sat overflow: {}", capacity_sat)))?; | ||
| let push_sat: i64 = push_msat | ||
| .map(|m| (m / 1000).try_into()) |
There was a problem hiding this comment.
Hmm, seems this could lead to surprises in the future, if we simply forget about any sub-sat amounts and then want to assert them? Maybe this means the entire trait needs to take sat denomination? But I guess we have similar issues with other LND APIs, such as list_channels?
| tokio::task::spawn_blocking(move || f(&*client)).await.expect("CLN RPC task panicked") | ||
| } | ||
|
|
||
| fn make_error(&self, detail: String) -> TestFailure { |
There was a problem hiding this comment.
nit: It seems this can just be inlined everywhere, rather than duplicating it a bunch of times for each ExternalNode?
| run: | | ||
| source ./scripts/download_bitcoind_electrs.sh | ||
| mkdir bin | ||
| mkdir -p bin |
There was a problem hiding this comment.
Why is this suddenly necessary?
| } | ||
|
|
||
| #[tokio::test(flavor = "multi_thread", worker_threads = 1)] | ||
| #[ignore = "CLN splicing requires --experimental-splicing flag and CLN v25+"] |
There was a problem hiding this comment.
Yes, why can't we enable this? AFAICT CLN v25+ should be availble? E.g., https://hub.docker.com/layers/elementsproject/lightningd/v25.12.1/images/sha256-fb5c956ff969cf7bb3a9ee46240daa9fdc03a9d7bc35931a2f8a888343fd16d2
| } | ||
|
|
||
| /// CLN v24.08+ includes a `payment_secret` in outbound keysend HTLCs. | ||
| /// LDK treats any inbound HTLC with `payment_secret` as a BOLT11 payment and |
There was a problem hiding this comment.
Hmm, I'm not positive this is true? Could you un-ignore this so we can see what the error actually is? Is it related to the checks here?: https://github.com/lightningdevkit/rust-lightning/blob/db42ad6ba053d54f98f360f05afad5fee896ed69/lightning/src/ln/channelmanager.rs#L8479
|
|
||
| /// Eclair 0.8.0 rejects LDK keysend with `InvalidOnionPayload(8,0)` — LDK includes | ||
| /// `payment_data` (TLV type 8) in keysend onions, which Eclair considers invalid for | ||
| /// spontaneous payments. Eclair 0.14.0+ may handle this differently. |
There was a problem hiding this comment.
How do you arrive at this conclusion? Are they aware of this incompatibility? Can you link the releveant issues when including such remarks? Given we have Eclair v0.14-SNAPSHOT, can't we un-ignore this then?
| } | ||
|
|
||
| #[tokio::test(flavor = "multi_thread", worker_threads = 1)] | ||
| #[ignore = "Eclair 0.8.0 does not support splicing (introduced in v0.10.0+)"] |
There was a problem hiding this comment.
Why is this ignored, we should have Eclair v0.14-SNAPSHOT available, i.e., can test splicing, no?
…setup Address review feedback on PR lightningdevkit#839: - Replace reqwest dependency with bitreq for Eclair REST client - Replace curl shell calls with bitreq async HTTP requests - Remove per-test docker container recreation (reuse single Eclair instance, unlock UTXOs between tests instead) - Fix chmod -R 755 to u+rwX,go+rX in CLN/LND CI workflows - Add --fail flag to curl readiness check in Eclair CI
Update test infrastructure to Bitcoin Core 29.0: - corepc-node feature: 27_2 → 29_0 - electrsd feature: corepc-node_27_2 → corepc-node_29_0 - download script: bitcoind 27.2 → 29.0 with updated SHA256 hashes Prerequisite for lightningdevkit#839 (interop test harness) which needs bitcoind 29.0 for compatibility with updated Docker images.
Update test infrastructure to Bitcoin Core 29.0: - corepc-node: 0.10.0 → 0.10.1, feature 27_2 → 29_0 - electrsd feature: corepc-node_27_2 → corepc-node_29_0 - download script: bitcoind 27.2 → 29.0 with updated SHA256 hashes Prerequisite for lightningdevkit#839 (interop test harness).
Summary
Closes #766
Add a interop test harness for testing LDK-Node against external Lightning implementations (LND, CLN, Eclair).
Test Coverage
Each implementation runs the same shared scenarios:
lightninglabs/lndv0.18.5-betaelementsproject/lightningdv24.08.2acinq/eclairlatest(currently 0.14.0-SNAPSHOT){payment, idle} × {ldk, ext} × {coop, force} × {ldk, ext}— covers disconnect/reconnect timing, close type, and close initiatorDesign Decisions
Feedback on the design trade-offs would be especially appreciated! Here are the first few decisions I've made in this draft:
bitcoind versions. Eclair requires Bitcoin Core 30+ (
bitcoind:30.2); CLN also uses 30.2. LND stays on 29.1 due to intermittentGetInfogRPC hangs under 30.2.ExternalNode::wait_for_block_sync(). Eclair's API responds before block indexing completes, so it needs explicit sync-waiting. LND's gRPC hangs if forced to wait. Solution: trait default no-op, Eclair overrides.Eclair container recreation. Stale channel state causes SIGSEGV in secp256k1-jni (eclair#3275). Each test recreates the container;
setup_clients()callslockunspentto release UTXOs from prior force-closes.Eclair request timeout + settlement polling. 60s reqwest timeout prevents indefinite hangs.
pay_invoice/send_keysendpoll/getsentinfoso failures surface immediately.Issues Found
CLN → LDK keysend fails.
payment_secretin keysend HTLCs, which appears to cause LDK to reject the payment duringinbound_payment::verify()— likely because no invoice exists for spontaneous paymentstest_receive_keysend_paymentis skipped for CLN; needs further investigationsend_keysendnow polls/getsentinfofor settlement (matchingpay_invoice)