Skip to content

Same user's pins split across two user_ids (object vs forest/bare pins) breaks user-scoped queries #26

@ehsan6sha

Description

@ehsan6sha

Summary

For a single user, the object pins and the encrypted-forest pins (the bare manifest / manifest-page / HAMT-node blocks) are stored in the pins table under two different user_id values. Any user-scoped maintenance query keyed on the object owner's hash silently misses the forest pins.

Observed (production, one user, documents bucket)

  • Object pins object:documents/<key> -> pins.user_id = 1492a9c9... (the object owner hash)
  • Forest pins (bare flat keys: manifest Qmfd462..., manifest pages, HAMT nodes) -> pins.user_id = 2d2dfffd... (a different hash)
  • Both are the same user's documents data; the forest pins decrypt with the same forest_dek and fully reconstruct the bucket (17 files recovered).

How the object owner hash is derived

session.hashed_user_id = hash_user_id(user_id) (BLAKE3) — crates/fula-cli/src/handlers/admin.rs:1041, used as the S3 object owner (crates/fula-cli/src/handlers/bucket.rs:26; handlers/object.rs with_owner(&session.hashed_user_id)). The object pins' user_id matches this; the forest/bare pins carry a different value, i.e. they were stamped from a different underlying user_id.

Impact

  • User-scoped queries miss the forest. The index-rebuild bare-key scan, the users-index publisher (UserBucketsIndex/GlobalUsersIndex), and any GC/replication accounting filtered on WHERE user_id = <object hash> will not see the forest blocks. In a real recovery, a user_id-scoped bare export returned zero forest blocks (all under the second id), forcing a per-CID on-demand fallback.
  • Risk of forest blocks being mis-attributed / orphaned in any per-user lifecycle operation.

Leading hypothesis

The forest predates an identity / userKey change: the underlying user_id hashing to 2d2dfffd... is older, while recent object writes hash to 1492a9c9... (possibly related to the client userKey_v2 derivation change). Likely legacy data.

Proposed follow-up (post-recovery; not urgent)

  1. Determine which input produces 2d2dfffd... vs 1492a9c9... (old vs new user_id, or a different stamping path for bare/forest pins).
  2. Decide whether to backfill/migrate forest pins to the canonical hashed_user_id.
  3. Make user-scoped maintenance queries forest-aware, or unify the user_id at write time so object and forest pins always share one owner.

Context

Surfaced during the post-ipfs repo gc recovery (index-node-loss / safe-gc work). Filed for documentation; to be addressed after recovery completes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions