Global Memory Layer (GML) — Codename: REMNANT Version: 0.1.0 | Status: FORGE READY | Owner: Joe Panetta / Prometheus Workforce Analytics 0. FORGE DIRECTIVE This PRD is structured for consumption by the Autonomous Software Factory (ASF). Each section maps directly to an ASF agent task. Agents should treat each ## MODULE as an independent deployable unit. Dependency chains are explicit. All modules should be buildable in isolation and composable at the platform level. 1. PRODUCT OVERVIEW 1.1 Vision A decentralized, AI-powered oral history platform that captures, transcribes, indexes, and permanently preserves the stories, languages, and knowledge of endangered cultures worldwide. Communities own their data. The AI surfaces connections across civilizations. 1.2 Mission Statement Before the last speaker of every dying language falls silent — capture the voice. Index the wisdom. Connect the threads. Let nothing be forgotten. 1.3 Core Thesis Every 2 weeks, a language goes extinct. Every day, elders die with 80+ years of irreplaceable knowledge. REMNANT is infrastructure for the species — not a startup, not a product. A responsibility. 1.4 Target Users User TypeDescriptionStory ContributorsElders, community members, oral historians submitting recordingsCommunity ArchivistsLocal coordinators managing their community's vaultResearchersLinguists, anthropologists, historians querying the archiveListenersGeneral public discovering connected human storiesForge OperatorsASF engineers building/extending platform modules 2. SYSTEM ARCHITECTURE OVERVIEW [CAPTURE LAYER] → Mobile PWA / Offline-first recorder ↓ [INGESTION LAYER] → Upload queue → IPFS / Filecoin storage ↓ [TRANSCRIPTION LAYER] → Deepgram STT → multilingual transcript ↓ [AI INDEXING LAYER] → Multi-agent pipeline (tags, themes, entities, emotions) ↓ [VECTOR LAYER] → Embeddings → Pinecone/Weaviate semantic search ↓ [CONNECTION LAYER] → Cross-cultural thread finder agent ↓ [DISCOVERY LAYER] → Web portal — browse, search, explore connections ↓ [SOVEREIGNTY LAYER] → Community access controls — who sees what 3. MODULES (FORGE BUILD TARGETS) MODULE 01 — CAPTURE APP ASF Prompt Target: "Build a mobile-first PWA audio recorder with offline support, multilingual UI, and IPFS upload pipeline" Stack: Next.js 14, TypeScript, Tailwind CSS, IndexedDB (offline queue), Deepgram, IPFS/web3.storage Core Features: Record audio (WAV/MP3) directly in browser Offline-first — queue recordings when no connection, auto-upload on reconnect Pre-recording metadata form: contributor name, language spoken, community, location (optional), story type tag Upload to IPFS on submit — return CID for permanent reference Basic waveform visualization during recording Multilingual UI (EN, ES, FR — extensible) Works on low-end Android/iOS Acceptance Criteria: Record → stop → preview → submit flow works fully offline Upload retries on reconnect without user intervention IPFS CID returned and stored locally Metadata stored alongside audio MODULE 02 — INGESTION SERVICE ASF Prompt Target: "Build a Node.js microservice that listens for new IPFS uploads, downloads audio, sends to Deepgram for transcription, and stores results in Postgres" Stack: Node.js, Express, Deepgram SDK, IPFS HTTP client, Postgres, BullMQ (job queue) Core Features: Webhook receiver for new upload events Download audio from IPFS by CID Submit to Deepgram Nova-2 — multilingual, with speaker diarization Store raw transcript + metadata in Postgres Emit event to AI Indexing Layer on completion Retry logic for failed transcriptions Job status API endpoint Acceptance Criteria: Full pipeline: CID in → transcript stored < 3 minutes Failed jobs retry 3x with exponential backoff Language detection logged with confidence score MODULE 03 — AI INDEXING AGENT PIPELINE ASF Prompt Target: "Build a multi-agent pipeline that takes an oral history transcript and extracts: named entities, themes, emotions, cultural markers, time period references, geographic references, and generates a 3-sentence summary" Stack: Node.js, Anthropic SDK (Claude Sonnet), LangChain or direct API, Postgres, Redis Agents: AgentRoleEntityExtractorNamed people, places, organizations, datesThemeTaggerCultural themes (family, land, conflict, ritual, memory, etc.)EmotionAnalyzerDominant emotional tones of the narrativeTimePeriodClassifierEstimated historical era referencesGeographicMapperReferenced locations → lat/lng via geocodingSummaryAgent3-sentence plain-language summaryLanguageProfilerDialect markers, code-switching detection Output Schema: json{ "story_id": "uuid", "entities": { "people": [], "places": [], "organizations": [] }, "themes": ["family", "displacement", "land"], "emotions": ["grief", "pride", "nostalgia"], "time_period": "mid-20th century", "geographic_refs": [{ "name": "Oaxaca", "lat": 17.06, "lng": -96.72 }], "summary": "...", "language_profile": { "primary": "Zapotec", "code_switch": "Spanish" } } Acceptance Criteria: All 7 agents run per transcript Output stored as JSONB in Postgres Pipeline completes < 60 seconds per story Hallucination guard: entities must appear verbatim in transcript MODULE 04 — VECTOR EMBEDDING + SEMANTIC SEARCH ASF Prompt Target: "Build a service that embeds oral history summaries and tags using OpenAI text-embedding-3-small, stores in Pinecone, and exposes a semantic search API" Stack: Node.js, OpenAI Embeddings API, Pinecone, Express Core Features: Embed story summary + tags on indexing completion Store vector in Pinecone with full metadata payload Search endpoint: /api/search?q=...&lang=...&theme=... Filter by language, theme, region, time period Return top-N results with relevance scores Acceptance Criteria: Semantic search returns results for natural language queries Filters composable (AND logic) Response < 500ms for top-10 results MODULE 05 — CONNECTION ENGINE ASF Prompt Target: "Build an agent that finds thematic and cultural connections between oral history stories from different communities and generates a human-readable 'connection report'" Stack: Node.js, Claude API, Pinecone, Postgres Logic: Nightly batch: for each new story, find top-5 semantically similar stories from different communities Feed pairs to Connection Agent: "These two stories come from different cultures. What threads do they share?" Store connection pairs + explanation in Postgres Surface connections in Discovery Layer Acceptance Criteria: At least 1 connection surfaced per new story (once archive > 50 stories) Connection explanation is 2-4 sentences, human-readable No intra-community connections flagged (cross-cultural only) MODULE 06 — DISCOVERY PORTAL ASF Prompt Target: "Build a Next.js web portal where users can browse, search, and explore oral history stories with an interactive world map, semantic search, and connection visualization" Stack: Next.js 14, TypeScript, Tailwind CSS, Mapbox GL, D3.js (connection graph), Postgres, Pinecone Core Features: World map with story pins — click to open story Story detail page: audio player, transcript, tags, summary, connections Semantic search bar Filter panel: language, theme, region, time period Connection visualization — thread map between related stories Community page — all stories from one community Language page — all stories in a given language Acceptance Criteria: Map loads with pins < 2 seconds Story detail page fully rendered SSR Connection graph renders for any story with >= 1 connection Accessible (WCAG AA) MODULE 07 — SOVEREIGNTY LAYER ASF Prompt Target: "Build a community access control system where each community has an admin who can set visibility (public/researchers-only/community-only) for any story in their vault" Stack: Next.js API routes, Postgres, NextAuth.js, Role-based middleware Roles: RolePermissionsContributorSubmit stories to their community vaultCommunity AdminSet visibility, manage contributors, delete storiesResearcherAccess researcher-only stories (verified accounts)PublicAccess public stories onlyPlatform AdminFull access Acceptance Criteria: Community-only stories not returned in any public API response Researcher-only stories gated behind verified account flag Community Admins can toggle visibility per story Audit log for all visibility changes 4. DATA MODELS stories sqlid UUID PRIMARY KEY cid TEXT -- IPFS content ID contributor_name TEXT community_id UUID FK language TEXT location_name TEXT lat FLOAT lng FLOAT story_type TEXT -- ['personal', 'historical', 'ceremonial', 'practical', 'myth'] visibility TEXT -- ['public', 'researchers', 'community'] audio_url TEXT transcript TEXT index_data JSONB -- Module 03 output created_at TIMESTAMP communities sqlid UUID PRIMARY KEY name TEXT region TEXT country TEXT primary_language TEXT admin_user_id UUID FK created_at TIMESTAMP connections sqlid UUID PRIMARY KEY story_a_id UUID FK story_b_id UUID FK explanation TEXT similarity_score FLOAT created_at TIMESTAMP 5. FORGE BUILD ORDER PhaseModulesDependencyPhase 1Module 02 (Ingestion), Module 03 (Indexing)NonePhase 2Module 04 (Vectors), Module 05 (Connections)Phase 1Phase 3Module 01 (Capture App), Module 07 (Sovereignty)Phase 1Phase 4Module 06 (Discovery Portal)All 6. NON-FUNCTIONAL REQUIREMENTS RequirementTargetOffline capabilityCapture app works with zero connectivityPermanent storageIPFS/Filecoin — no single point of failureLatency — search< 500msLatency — transcription< 3 min end-to-endAccessibilityWCAG AA on Discovery PortalData sovereigntyCommunities control their data, periodOpen sourceFull platform MIT licensed 7. SUCCESS METRICS (6-MONTH TARGETS) MetricTargetStories captured500+Languages represented25+Communities onboarded10+Cross-cultural connections surfaced200+Researcher accounts50+ 8. FORGE CONSUMPTION NOTES Each Module section is a standalone ASF job Feed Module header + Core Features + Acceptance Criteria as the Forge prompt Stack hints should be passed as constraints Modules can be parallelized in Phase 1 and Phase 2 Discovery Portal (Module 06) is the final integration target Use the ASF intervention log to capture any manual overrides during build
Generated files, downloadable ZIP, and rerunnable pipelines all live here.
Build history, retries, and verification surfaces make the output easier to review with a team.
Start another run when you want to add features, fix issues, or generate a deployment package.
Did this project work out of the box?