Add graph-storage crate#4198
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the graph-storage crate, which provides a delta-based graph representation for the Graphite file format, including CRDT/delta computation, conversion utilities, and round-trip tests. It also cleans up legacy migrations in graph-craft and updates MemoHash to compare values instead of hashes. The review feedback highlights a potential DoS/OOM vulnerability when resizing the exports vector with unvalidated slot indices, suggests optimizing the O(N^2) pairwise comparison in order_consistent to O(N log N) by sorting, and recommends using into_iter().next() instead of drain(..).next() for idiomatic element extraction.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
21ab02a to
2350854
Compare
2350854 to
6dda108
Compare
6811f03 to
9b01dd2
Compare
|
@cubic-ai-dev |
@TrueDoctor I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
9 issues found across 26 files
Confidence score: 2/5
- There are multiple high-confidence, user-impacting regressions (sev 7–8) across hashing, graph diffing, and serialization, so merge risk is currently high rather than routine.
- Most severe:
node-graph/libraries/core-types/src/memo.rsmakesMemoHashequality ignorehashwhileHashstill includes it, which can violate hash collection invariants and cause incorrect map/set behavior. - Several storage/runtime paths risk dropped or corrupted data: missed network/export diffs in
document/graph-storage/src/delta.rs, omitted export-only resources indocument/graph-storage/src/from_runtime.rs, and silent truncation indocument/graph-storage/src/to_runtime.rswheninputslengths differ. - Pay close attention to
node-graph/libraries/core-types/src/memo.rs,document/graph-storage/src/delta.rs,document/graph-storage/src/from_runtime.rs,document/graph-storage/src/to_runtime.rs, andnode-graph/libraries/resources/src/lib.rs- they contain the highest-likelihood correctness and round-trip integrity risks.
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
49f9dee to
05848ee
Compare
There was a problem hiding this comment.
11 issues found across 19 files
Confidence score: 2/5
- Multiple high-severity, high-confidence data-integrity risks make this PR risky to merge as-is, especially in core graph storage/CRDT paths (
document/graph-storage/src/attributes.rs,document/graph-storage/src/to_runtime.rs,document/graph-storage/src/crdt.rs,document/graph-storage/src/ids.rs). - The most severe issue is inconsistent
Priorityequality/ordering/hashing indocument/graph-storage/src/attributes.rs, which can breakHashMap/BTree*behavior and lead to hard-to-debug corruption-like behavior. - There are additional concrete regression risks affecting user-visible state correctness: duplicate node IDs can silently collapse nodes during runtime reconstruction, deletes can be resurrected without tombstones, and non-canonical
Revhashing can make identical deltas produce different IDs. - Pay close attention to
document/graph-storage/src/attributes.rs,document/graph-storage/src/to_runtime.rs,document/graph-storage/src/crdt.rs,document/graph-storage/src/ids.rs,document/graph-storage/src/session.rs,document/graph-storage/src/delta.rs,document/graph-storage/src/model.rs- core invariants and history/diff behavior can diverge from expected graph state.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="document/graph-storage/src/session.rs">
<violation number="1" location="document/graph-storage/src/session.rs:162">
P2: Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.</violation>
</file>
<file name="document/graph-storage/src/attributes.rs">
<violation number="1" location="document/graph-storage/src/attributes.rs:126">
P1: `Priority` has inconsistent `PartialEq`/`Eq`/`Ord`/`Hash` semantics. This can corrupt behavior in `HashMap`/`BTree*` when equal values hash or order differently.</violation>
</file>
<file name="document/graph-storage/src/to_runtime.rs">
<violation number="1" location="document/graph-storage/src/to_runtime.rs:118">
P1: Duplicate `ORIGINAL_NODE_ID` values in one network are not detected, causing silent node loss when collecting into `FxHashMap`. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.</violation>
</file>
<file name="document/graph-storage/src/delta.rs">
<violation number="1" location="document/graph-storage/src/delta.rs:88">
P2: Export-list truncation is encoded as `SetExport(None)` but that op never shrinks `net.exports`, so diff-apply cannot fully reach target network shape when `to.exports.len() < from.exports.len()`. This leaves persistent trailing slots and causes value-level drift.</violation>
</file>
<file name="document/graph-storage/src/model.rs">
<violation number="1" location="document/graph-storage/src/model.rs:9">
P2: Parallel `inputs`/`inputs_attributes` vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.</violation>
<violation number="2" location="document/graph-storage/src/model.rs:75">
P2: `Reflection` stores required metadata outside the enum, so required state is not atomic. A `ChangeNodeInput` to `Reflection` can leave the node in a conversion-failing state until separate attribute writes happen.</violation>
</file>
<file name="document/graph-storage/src/crdt.rs">
<violation number="1" location="document/graph-storage/src/crdt.rs:186">
P1: Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.</violation>
</file>
<file name="document/graph-storage/src/ids.rs">
<violation number="1" location="document/graph-storage/src/ids.rs:86">
P1: `compute_rev` hashes non-canonical serialization of `RegistryDelta` that includes `HashMap` fields. Identical deltas can receive different `Rev` IDs, breaking stable content-addressed identity assumptions.</violation>
</file>
<file name="document/graph-storage/Cargo.toml">
<violation number="1" location="document/graph-storage/Cargo.toml:15">
P3: Use `{ workspace = true }` instead of an inline version — blake3 is already managed in `[workspace.dependencies]` at line 178 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>
<violation number="2" location="document/graph-storage/Cargo.toml:17">
P3: Use `{ workspace = true }` instead of an inline version — rustc-hash is already managed in `[workspace.dependencies]` at line 102 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>
<violation number="3" location="document/graph-storage/Cargo.toml:18">
P3: Use `{ workspace = true }` instead of an inline version — thiserror is already managed in `[workspace.dependencies]` at line 120 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| /// a value strictly between two neighbors, so concurrent insertions elsewhere never collide; an | ||
| /// exact tie between two peers inserting at the same gap is broken by `PeerId` in [`SourceKey`]. | ||
| /// `f64` precision is ample for the short fallback chains resources carry in practice. | ||
| #[derive(Copy, Clone, Debug, PartialEq, Serialize, Deserialize)] |
There was a problem hiding this comment.
P1: Priority has inconsistent PartialEq/Eq/Ord/Hash semantics. This can corrupt behavior in HashMap/BTree* when equal values hash or order differently.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/attributes.rs, line 126:
<comment>`Priority` has inconsistent `PartialEq`/`Eq`/`Ord`/`Hash` semantics. This can corrupt behavior in `HashMap`/`BTree*` when equal values hash or order differently.</comment>
<file context>
@@ -0,0 +1,150 @@
+/// a value strictly between two neighbors, so concurrent insertions elsewhere never collide; an
+/// exact tie between two peers inserting at the same gap is broken by `PeerId` in [`SourceKey`].
+/// `f64` precision is ample for the short fallback chains resources carry in practice.
+#[derive(Copy, Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct Priority(pub f64);
+
</file context>
| collector.push(entry); | ||
| } | ||
|
|
||
| convert_node(registry, declarations, node, metadata_path, runtime_id, node_collector, network_collector).map(|doc_node| (runtime_id, doc_node)) |
There was a problem hiding this comment.
P1: Duplicate ORIGINAL_NODE_ID values in one network are not detected, causing silent node loss when collecting into FxHashMap. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/to_runtime.rs, line 118:
<comment>Duplicate `ORIGINAL_NODE_ID` values in one network are not detected, causing silent node loss when collecting into `FxHashMap`. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.</comment>
<file context>
@@ -0,0 +1,270 @@
+ collector.push(entry);
+ }
+
+ convert_node(registry, declarations, node, metadata_path, runtime_id, node_collector, network_collector).map(|doc_node| (runtime_id, doc_node))
+ })
+ .collect::<Result<FxHashMap<_, _>, _>>()?;
</file context>
| Some(value) => match attributes.entry(key) { | ||
| std::collections::hash_map::Entry::Occupied(mut entry) => { | ||
| if force || timestamp > entry.get().timestamp { | ||
| entry.insert(Value { value, timestamp }); |
There was a problem hiding this comment.
P1: Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/crdt.rs, line 186:
<comment>Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.</comment>
<file context>
@@ -0,0 +1,200 @@
+ Some(value) => match attributes.entry(key) {
+ std::collections::hash_map::Entry::Occupied(mut entry) => {
+ if force || timestamp > entry.get().timestamp {
+ entry.insert(Value { value, timestamp });
+ }
+ }
</file context>
| /// Hash the identity-bearing fields of a `Delta` with blake3 and truncate to 128 bits. | ||
| pub(crate) fn compute_rev(parents: &[Rev], author: PeerId, timestamp: TimeStamp, delta_type: &RegistryDelta) -> Rev { | ||
| let mut hasher = blake3::Hasher::new(); | ||
| let bytes = rmp_serde::to_vec(&(parents, author, timestamp, delta_type)).expect("Delta identity fields must serialize"); |
There was a problem hiding this comment.
P1: compute_rev hashes non-canonical serialization of RegistryDelta that includes HashMap fields. Identical deltas can receive different Rev IDs, breaking stable content-addressed identity assumptions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/ids.rs, line 86:
<comment>`compute_rev` hashes non-canonical serialization of `RegistryDelta` that includes `HashMap` fields. Identical deltas can receive different `Rev` IDs, breaking stable content-addressed identity assumptions.</comment>
<file context>
@@ -0,0 +1,92 @@
+/// Hash the identity-bearing fields of a `Delta` with blake3 and truncate to 128 bits.
+pub(crate) fn compute_rev(parents: &[Rev], author: PeerId, timestamp: TimeStamp, delta_type: &RegistryDelta) -> Rev {
+ let mut hasher = blake3::Hasher::new();
+ let bytes = rmp_serde::to_vec(&(parents, author, timestamp, delta_type)).expect("Delta identity fields must serialize");
+ hasher.update(&bytes);
+ let digest = hasher.finalize();
</file context>
|
|
||
| // A new edit abandons any undone-forward branch: those revs stay in the DAG but are no longer | ||
| // reachable via redo. (Mirrors the legacy editor clearing its redo history on commit.) | ||
| self.document.redo_stack.clear(); |
There was a problem hiding this comment.
P2: Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/session.rs, line 162:
<comment>Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.</comment>
<file context>
@@ -0,0 +1,512 @@
+
+ // A new edit abandons any undone-forward branch: those revs stay in the DAG but are no longer
+ // reachable via redo. (Mirrors the legacy editor clearing its redo history on commit.)
+ self.document.redo_stack.clear();
+ for op in ops {
+ let reverse = self.document.compute_reverse_delta(target, &op)?;
</file context>
| pub struct Node { | ||
| pub(crate) implementation: Implementation, | ||
| pub(crate) inputs: Vec<InputSlot>, | ||
| pub(crate) inputs_attributes: Vec<Attributes>, |
There was a problem hiding this comment.
P2: Parallel inputs/inputs_attributes vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/model.rs, line 9:
<comment>Parallel `inputs`/`inputs_attributes` vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.</comment>
<file context>
@@ -0,0 +1,124 @@
+pub struct Node {
+ pub(crate) implementation: Implementation,
+ pub(crate) inputs: Vec<InputSlot>,
+ pub(crate) inputs_attributes: Vec<Attributes>,
+ pub(crate) attributes: Attributes,
+ pub(crate) network: NetworkId,
</file context>
| import_idx: usize, | ||
| }, | ||
| /// Marker; the `DocumentNodeMetadata` lives in `inputs_attributes`. | ||
| Reflection, |
There was a problem hiding this comment.
P2: Reflection stores required metadata outside the enum, so required state is not atomic. A ChangeNodeInput to Reflection can leave the node in a conversion-failing state until separate attribute writes happen.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/model.rs, line 75:
<comment>`Reflection` stores required metadata outside the enum, so required state is not atomic. A `ChangeNodeInput` to `Reflection` can leave the node in a conversion-failing state until separate attribute writes happen.</comment>
<file context>
@@ -0,0 +1,124 @@
+ import_idx: usize,
+ },
+ /// Marker; the `DocumentNodeMetadata` lives in `inputs_attributes`.
+ Reflection,
+}
+
</file context>
| graph-craft = { workspace = true } | ||
| graphene-resource = { workspace = true } | ||
| core-types = { workspace = true } | ||
| blake3 = "1.5" |
There was a problem hiding this comment.
P3: Use { workspace = true } instead of an inline version — blake3 is already managed in [workspace.dependencies] at line 178 of the root Cargo.toml. An inline version bypasses centralised version management.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 15:
<comment>Use `{ workspace = true }` instead of an inline version — blake3 is already managed in `[workspace.dependencies]` at line 178 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>
<file context>
@@ -0,0 +1,21 @@
+graph-craft = { workspace = true }
+graphene-resource = { workspace = true }
+core-types = { workspace = true }
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
</file context>
| blake3 = "1.5" | |
| blake3 = { workspace = true } |
| core-types = { workspace = true } | ||
| blake3 = "1.5" | ||
| rmp-serde = "1.3" | ||
| rustc-hash = "2.0" |
There was a problem hiding this comment.
P3: Use { workspace = true } instead of an inline version — rustc-hash is already managed in [workspace.dependencies] at line 102 of the root Cargo.toml. An inline version bypasses centralised version management.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 17:
<comment>Use `{ workspace = true }` instead of an inline version — rustc-hash is already managed in `[workspace.dependencies]` at line 102 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>
<file context>
@@ -0,0 +1,21 @@
+core-types = { workspace = true }
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
+thiserror = "2.0"
+
</file context>
| rustc-hash = "2.0" | |
| rustc-hash = { workspace = true } |
| blake3 = "1.5" | ||
| rmp-serde = "1.3" | ||
| rustc-hash = "2.0" | ||
| thiserror = "2.0" |
There was a problem hiding this comment.
P3: Use { workspace = true } instead of an inline version — thiserror is already managed in [workspace.dependencies] at line 120 of the root Cargo.toml. An inline version bypasses centralised version management.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 18:
<comment>Use `{ workspace = true }` instead of an inline version — thiserror is already managed in `[workspace.dependencies]` at line 120 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>
<file context>
@@ -0,0 +1,21 @@
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
+thiserror = "2.0"
+
+[dev-dependencies]
</file context>
| thiserror = "2.0" | |
| thiserror = { workspace = true } |
No description provided.