Skip to content

feat(artifact): turn a querychat session into a standalone, runnable artifact (Python)#245

Draft
cpsievert wants to merge 3 commits into
mainfrom
feat/artifact-feature
Draft

feat(artifact): turn a querychat session into a standalone, runnable artifact (Python)#245
cpsievert wants to merge 3 commits into
mainfrom
feat/artifact-feature

Conversation

@cpsievert

@cpsievert cpsievert commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

What this adds

querychat is great for exploring data conversationally — but today that exploration is trapped in the session. Close the tab and the queries and charts you built are gone. This PR lets a user turn the work they did during a querychat session into a standalone, runnable, downloadable artifact: a Quarto dashboard, a Shiny app, a Marimo or Jupyter notebook, or a freeform format they describe in words.

In short: chat your way to the right queries and visualizations, give the session a prompt, and walk away with a real file you can run, share, version-control, or drop into a report — no copy-pasting SQL out of the UI.

Why it's useful

  • Bridges exploration → deliverable. Insights found by asking questions become a reproducible asset.
  • Meets people where they work. Pick the output format (dashboard / app / notebook) and language (R or Python) that fits the team.
  • Self-contained. Small datasets are bundled as CSV so the artifact runs as-is; larger / DB-backed sources get a clearly-marked data-setup TODO.
  • Iterative. Revise with AI, step back and forth through versions, and download a zip (source + README + data).

ℹ️ Python-only for now (pkg-py). See Limitations.

See it in action

Upon requesting a new artifact:

01-modal

After generating the artifact:

02-panel

How it works (user's-eye view)

  1. Open the Create Artifact modal — type /artifact, or the assistant offers it when it senses you're ready.
  2. The modal shows a gallery of the session's queries & charts, an output-format picker, a language picker, and a directions box. An LLM recommendation pre-selects sensible defaults while it opens.
  3. Hit Generate — a side panel slides open and the artifact source streams into a read-only editor; a pill is added to the chat linking back to it.
  4. From the panel: revise with AI (pushes a new version), step through versions, and download the zip.
  5. State survives bookmarking, so a restored session re-shows its artifacts.

Approach & architecture (for reviewers)

Built around a few deliberate constraints (full write-up in memory-bank/artifact-feature.md):

  • A hard reactivity boundary. ArtifactOrchestrator is plain async business logic — reads no reactive.Value, defines no effects, never touches input.*. All reactivity lives in artifact_server(). That's what lets every flow be unit-tested with plain fakes.
  • One owner per concern.
    • ArtifactView — the only place server→client output happens (custom messages, modal, chat pill).
    • ArtifactChat — the only place chatlas is touched; deep-copies the live chat before each call, so generate/revise/recommend never pollute the visible conversation.
    • ArtifactStore — the only place artifacts are held (LRU) and serialized.
    • active_artifact_id — a single reactive value, the sole source of truth for panel visibility.
  • State is pydantic; bookmarks stay small. ArtifactState / ArtifactVersion are pydantic models; bundled data + data instructions are exclude=True and regenerated from the live data source on restore (mirroring how viz widgets re-run ggsql).
  • Tool names are a shared contract centralized in _tool_names.py, so tool registration and the gallery that mines chat turns can't silently drift.
  • Two entry paths, one modal-opening path. The /artifact command and the querychat_request_artifact tool converge on the same flow (the tool fires mid-stream where no reactive context exists, so it relays through a status-gated effect).
Browser (artifact-core.ts)
   ▲  querychat-artifact-* messages        │ inputs / setInputValue
   │                                        ▼
artifact_server()   ← all reactivity, active_artifact_id
   │ drives
   ▼
ArtifactOrchestrator   ← no reactive state
   ├── ArtifactChat   (chatlas fork + streaming)
   ├── ArtifactView   (server→client, modal, pill)
   ├── ArtifactStore  (LRU + bookmark)
   └── pure helpers   (data · prompt · gallery · readme)

Limitations & things to know

  • Depends on unreleased shiny / shinychat. Relies on shinychat slash commands + input_submit_textarea / input_code_editor and a recent shiny; pyproject.toml currently pins git refs for both (shinychat → worktree-feat+slash-commands). These must move to released versions before any release.
  • Python only. No R counterpart yet; the layering is a reasonable template if/when parity is pursued.
  • Always-on, no opt-out. The tool and /artifact command register regardless of the tools= config, and the panel UI is always injected. Deliberate by omission — worth revisiting.
  • The Py↔JS wire contract is hand-duplicated on both sides with no shared schema; a rename in one language silently no-ops in the other. Known maintenance hazard.
  • Generated front-end assets are committed. static/js/artifact.js / static/css/artifact.css are build outputs of js/src/* — edit the source and rebuild, never the generated files.
  • Stored artifacts are LRU-capped (25/session); reopening an evicted artifact's pill no-ops.
  • Data bundling threshold: DataFrame sources ≤ 5 MB are bundled as CSV; larger or DB-backed sources get a data-setup TODO.

Testing

  • Unit tests per module (tests/test_artifact_*.py); the orchestrator suite runs on plain fakes thanks to the no-reactive-state rule.
  • Playwright integration tests (test_13_artifact*, test_14_artifact_bookmark) covering the modal, generation, panel, pill navigation, and bookmark restore.
  • Full Python unit suite: 523 passed locally; artifact e2e (modal → generate → panel → pill) verified against a live LLM.

Comment thread pyproject.toml Outdated
Turn the work done during a querychat session — the SQL queries and ggsql
visualizations produced by asking questions — into a standalone, runnable,
downloadable artifact: a Quarto dashboard, Shiny app, Marimo or Jupyter
notebook, or a freeform format the LLM describes on demand. Python-only
(pkg-py); no R counterpart yet.

What the user sees:
- Open a "Create Artifact" modal via the `/artifact` slash command or when the
  LLM calls the `querychat_request_artifact` tool. Both paths converge on a
  single modal-opening flow.
- Pick from a gallery of the session's queries/visualizations plus an output
  format, language (R/Python), and directions; an LLM recommendation
  pre-selects sensible defaults.
- On Generate, a side panel opens and the artifact source streams into a
  read-only editor while a pill is appended to the chat.
- Revise with AI (pushing new versions), step through versions, and download a
  zip (source + README.md + bundled data). Artifacts survive bookmarking.

Architecture (see memory-bank/artifact-feature.md):
- artifact_server (_artifact_server.py) owns all reactivity; active_artifact_id
  is the single source of truth for panel visibility.
- ArtifactOrchestrator (_artifact_orchestrator.py) is non-reactive business
  logic for every flow, testable with plain fakes.
- One owner per concern: ArtifactView (server→client wire protocol, modal,
  pill), ArtifactChat (chatlas transport; forks the chat so generation never
  pollutes the visible history), ArtifactStore (LRU container + bookmark
  serialization).
- ArtifactState/ArtifactVersion and the request/type models are pydantic;
  bundled data and data instructions are excluded from bookmarks and
  regenerated from the live data source on restore.
- Tool names are centralized in _tool_names.py so tool registration and gallery
  extraction can't drift.
- Front-end artifact-core.ts / artifact.css are built to static/ (mirroring the
  viz feature); the Py↔JS message contract is hand-duplicated on both sides.

Includes unit tests per module (orchestrator suite uses plain fakes) and
Playwright integration tests, plus a memory-bank entry documenting the design.
@cpsievert cpsievert force-pushed the feat/artifact-feature branch from a18c0a7 to cdbed9f Compare June 12, 2026 15:37
shinychat switched from a <textarea> to a TipTap contenteditable div on
@main, breaking two categories of e2e tests:

1. `test_chat_input_visible` expected a native `placeholder` attribute,
   but TipTap uses `data-placeholder` instead.

2. Artifact gallery/generation tests relied on `to_be_editable()` to
   detect stream completion, but the contenteditable div is always
   editable (its attribute never flips to "false"). The correct signal
   is the container's `disabled` CSS class, which shinychat adds/removes
   when streaming starts/ends.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant