Large surface area: ingests YouTube / script–related data for CPPA workflows, with a primary run_cppa_youtube_script_tracker command and many options (sources, dry-run, batching). Prefer reading the command module docstring and --help before changing defaults.
The command runs in phases: queued metadata JSON → YouTube Data API video discovery → transcript download (VTT/text) → optional Pinecone upsert. Service details: docs/service_api/cppa_youtube_script_tracker.md. See docs/Architecture_data_flow.md.
YouTube Data API for channel/video metadata within the configured time window, plus transcript providers for caption text. Local queue JSON under the workspace is processed before network phases when present.
Videos, transcripts, channels, and related linkage rows are persisted in this app’s models. Raw transcript files and metadata snapshots are also stored under WORKSPACE_DIR for auditing and replays. References: docs/Schema.md, section 10 — CPPA YouTube Script Tracker · models.py · docs/service_api/cppa_youtube_script_tracker.md.
Not applicable for the main collector. There is no Markdown repo publishing step in run_cppa_youtube_script_tracker.
After successful ingest, the collector can shell out to run_cppa_pinecone_sync using --pinecone-app-id and --pinecone-namespace (defaults from environment—see --help). Failures are surfaced in logs; detailed retry state lives in cppa_pinecone_sync models. See docs/Pinecone_preprocess_guideline.md.
python manage.py run_cppa_youtube_script_tracker --help- Run targeted tests:
python -m pytest cppa_youtube_script_tracker/tests/ -v(see the Tests section below).
Phases: (1) process queued metadata JSONs, (2) YouTube Data API video fetch for the time window, (3) transcript download—then optional Pinecone.
| Option | Description |
|---|---|
--start-time |
ISO datetime; only videos published after this time. Default: latest published_at in DB (or YOUTUBE_DEFAULT_PUBLISHED_AFTER if empty). |
--end-time |
ISO datetime upper bound; default now. |
--channel-title |
Restrict to one channel title (must match configured channel map / search). |
--dry-run |
Skip DB writes and API calls. |
--skip-transcript |
Skip phase 3 (transcripts). |
--pinecone-app-id |
App id passed through to run_cppa_pinecone_sync (default youtube). |
--pinecone-namespace |
Namespace for Pinecone (default from env, see command --help). |
- Django app label:
cppa_youtube_script_tracker - Path (from repo root):
cppa_youtube_script_tracker/ - Registration: Listed under
INSTALLED_APPSinconfig/settings.pyascppa_youtube_script_tracker.
| Command | Description |
|---|---|
run_cppa_youtube_script_tracker |
Management command: run_cppa_youtube_script_tracker |
Run python manage.py <command> --help for options.
Typical invocation (from repo root, after README prerequisites):
python -m pytest cppa_youtube_script_tracker/tests/ -v