How We Build AI Into Memoir

Memoir is not trying to ship a toy chatbot. The product is a private family archive, so the AI system has to act more like a careful preservation assistant than a novelty text generator. Our agent is Mimi. Mimi helps people turn raw family material into drafts: memories, recipes, profile details, photo album structure, and guided conversation history. The important part is that Mimi does not become the source of truth. Convex does. Mimi helps create and refine draft artifacts; humans still review the family record before it becomes durable published history.

There is one naming footnote because the repository still carries some older internal paths with mimi in the folder and table names, such as src/app/api/mimi, src/modules/mimi, and convex/schema/mimi.ts. In the product and in this article, the agent is Mimi. The legacy path names are implementation details, not the user-facing name.

The architecture is built around one rule: any AI action that matters has to pass through a typed tool, a server-side runtime policy, a Convex write boundary, and an observable trail. That sounds heavier than calling generateText(), but it is exactly what makes AI usable in a product where the output might become part of someone’s family history.

Mimi Is a Full-Stack System, Not a Component

Mimi spans the browser, Next.js route handlers, the AI SDK, OpenRouter, Convex, YAML-backed config, and PostHog. The user experiences this as a streaming assistant. The system sees a much more concrete pipeline: validate the request, resolve auth, load profile context, select a model, run a bounded tool loop, stream UI parts, execute server tools, persist the assistant message, persist tool events, update draft references, consume credits, and emit telemetry.

The agent core lives in src/ai/agents/mimi/agent.ts, the tool definitions live in src/ai/agents/mimi/tools/*, and the model provider wrapper lives in src/ai/provider.ts. The HTTP runtime lives mostly in src/app/api/mimi/_lib/run-mode.ts, with tool execution split through tool-runtime.ts, tool-executors/*, and persistence.ts. The client surface lives in src/modules/mimi/hooks/use-streaming-chat.ts and src/modules/mimi/lib/contracts.ts. Durable state is modeled in convex/schema/mimi.ts and implemented across convex/mimi/*.ts.

That split is not accidental. The browser owns interaction. The route handler owns orchestration. Convex owns durable data and access-sensitive behavior. Mimi gets tool privileges only through server code.

The Request Path Is Boring on Purpose

The public route at src/app/api/mimi/chat/route.ts is almost empty. It forwards the request to runMimiModeRequest("chat", request). That is the shape you want in a serious Next.js app: routes are boundaries, not junk drawers.

Inside run-mode.ts, one Mimi turn performs a deterministic sequence. It parses the request with MimiChatRequestSchema, resolves Clerk and Convex auth through resolveServerConvexAuthState, rate-limits the caller with SlidingWindowRateLimiter, loads the active conversation and profile, checks billing access, normalizes UI messages, chooses the configured model through createAIModel, builds the Mimi agent with createMimiAgent, streams the response with createAgentUIStreamResponse, and persists the final assistant turn through persistMimiResponseFinish.

The numbers matter here. The route caps message history with MAX_MIMI_REQUEST_MESSAGES = 20, caps agent loop depth with MAX_STEPS = 10, and uses a 60-second sliding rate-limit window with MIMI_RATE_LIMIT_MAX = 10. In production, the stream timeout is set to 300 seconds. That matches Vercel’s current Fluid Compute default duration of 300 seconds across plans, with Pro and Enterprise able to extend to 800 seconds. This is the difference between “AI endpoint” and “AI endpoint that survives real users sending long messy family stories.”

The Tool Loop Has Product-Specific Brakes

Mimi uses AI SDK 6’s ToolLoopAgent, which is the right abstraction for a multi-step assistant that can stream text and call tools. But we do not let the loop run like a free agent. In src/ai/agents/mimi/agent.ts, stopWhen: stepCountIs(config.maxSteps ?? 10) sets the hard loop cap. On top of that, the prepareStep hook dynamically removes tools after they hit their per-turn budget.

The important policy is around durable writes. Content draft tools are terminal for a turn. Once Mimi creates or updates a memory, recipe, or photo album draft, the remaining response is text-only. extract_memory is single-shot. get_profile_context is bounded. The default per-tool budget is two calls, with stricter one-call budgets for extraction and context tools.

That policy exists because these are not harmless toy tools. A weather lookup can be repeated. A draft creation tool can create duplicate durable state if you let the loop thrash. The engineering move is to make the model useful while keeping the state machine boring.

Tools Are Adapters Around Product Workflows

The tool layer is split into declarations and executors. Tool schemas and definitions live under src/ai/agents/mimi/tools. Runtime execution is assembled in src/app/api/mimi/_lib/tool-runtime.ts. Domain logic runs through src/app/api/mimi/_lib/tool-executors/*. Prompt wiring for tools lives in config/ai/tools.yaml.

That architecture means Mimi does not “write to the database.” Mimi asks for a typed operation. The server validates the operation, resolves the relevant profile and draft context, and calls Convex. For a family-memory product, that is the difference between assisted drafting and uncontrolled mutation.

The current tool families cover memory drafts, memory draft patching, memory extraction, profile field extraction, recipe drafting, photo album drafting, and profile context retrieval. Notice what is missing: arbitrary SQL, arbitrary Convex calls, arbitrary file access. Mimi’s tool surface mirrors the product workflows where guided capture is valuable.

Mimi’s State Is Durable

The durable AI model in convex/schema/mimi.ts is explicit. mimiConversations stores profile-scoped conversation state, active mode, draft reference, summary, and last-message cursor. mimiMessages stores user and assistant messages with prompt version, latency, token metadata, attachments, UI parts, model ID, and finish reason. mimiToolEvents records tool call ID, tool name, input, output, summary, and timestamp. mimiConversationDrafts links conversations to memory, recipe, and photo album drafts.

This matters because Mimi is not only a streaming UI. A family member can leave, come back, and continue a conversation around a draft. A developer can inspect which tool ran. Analytics can answer whether a slow turn came from model latency, tool execution, or persistence. A future migration can reason about actual rows instead of opaque blobs in browser state.

The indexes follow the access pattern: by_profileId_status, by_profileId_lastMessageAt, by_draftReference, by_conversationId_createdAt, by_conversationId_and_messageId, and by_conversationId_toolCallId. Convex’s indexing docs state that indexed query performance depends on the number of documents in the index range, while unbounded scans grow with the table. Mimi’s data is naturally profile-scoped and conversation-scoped, so those indexes line up with how the product actually reads.

The Client Streams, But Does Not Own Truth

The client hook src/modules/mimi/hooks/use-streaming-chat.ts uses useChat from @ai-sdk/react with DefaultChatTransport. It creates optimistic user messages, sends the real request body, merges streamed assistant parts, and listens to tool-call states so the UI can say “Saving draft”, “Draft saved”, or “We couldn’t save the draft” while the model is still streaming.

That gives users the speed they expect from modern AI tools without pretending the browser is authoritative. The final assistant message still gets recorded in Convex. Tool events still get recorded in Convex. Credit consumption happens after a non-aborted response is persisted. The UI can be optimistic, but the record is durable.

Configuration Is Treated Like Runtime Policy

Mimi’s model layer is configured through files instead of being scattered through route handlers. config/ai/models.yaml defines logical model IDs such as gpt-5-4-mini, gpt-5-4-nano, claude-4-5-haiku, grok-4-fast, and glm-5. src/server/config/ai-models.ts loads that catalog. src/ai/provider.ts creates OpenRouter models and wraps them with PostHog tracing.

Prompt and tool wiring follows the same pattern. The repo uses config/ai/mimi/prompts/* and config/ai/tools.yaml, then syncs those sources into generated TypeScript snapshots. next.config.ts explicitly traces the AI config files into the relevant route bundles. That gives us reviewable prompts, reproducible runtime behavior, and deploys that do not depend on loose files being accidentally present.

The Stack Is Chosen for Measurable Reasons

Bun matters in this repo because it shortens the local and CI feedback loop. Bun’s official docs claim bun install is up to 25x faster than npm install, and Bun’s current product page claims installs up to 30x faster than npm plus startup around 3x faster than Node.js. Third-party 2026 install benchmarks are in the same ballpark: one 847-package Next.js app measured npm install at 32.1 seconds, pnpm at 8.4 seconds, and Bun at 1.2 seconds cold. A 2,341-package monorepo measured npm at 89.4 seconds, pnpm at 21.7 seconds, and Bun at 4.1 seconds.

Those numbers do not mean Memoir’s AI calls are faster because of Bun. Mimi’s latency is dominated by model time, streaming duration, Convex reads/writes, and media handling. But Bun absolutely matters for the developer system around the product: config sync, tests, scripts, local startup, and CI all run through Bun. When the project has TypeScript, Convex codegen, prompt generation, design snapshot generation, and large test trees, shaving tens of seconds off repeated tooling steps is real engineering leverage.

Next.js 16 and Turbopack matter for the same reason. Vercel’s Next.js 16 release says Turbopack gives 2-5x faster production builds and up to 10x faster Fast Refresh. The 16.2 Turbopack notes cite 67-100% faster application refresh and 400-900% faster compile time in real-world apps. Memoir is a module-heavy App Router app, so faster rebuilds are not vanity metrics. They directly change how quickly you can work on Mimi, profiles, recipes, albums, and the app shell.

Convex matters because Mimi needs durable realtime state without a custom WebSocket layer. Convex’s realtime docs state that query functions automatically track dependencies and update subscribed clients. Its reactive database docs cite typical update pushes within 50-100ms after data changes. For a chat-like drafting workflow, that means conversation state, draft state, and related UI can stay synchronized without hand-rolled pub/sub plumbing.

Observability Is Part of the AI Product

src/ai/provider.ts wraps the model with traceLanguageModel, and src/app/api/mimi/_lib/persistence.ts captures latency, streamed characters, tool event count, model ID, input tokens, output tokens, draft type, finish reason, and abort state. PostHog’s LLM analytics model tracks generations, token counts, latency, cost, traces, spans, and tool calls. That maps cleanly to what we need to debug Mimi.

The useful question in production is rarely “did the model respond?” It is “why did this turn take 18 seconds?”, “which model generated this draft?”, “did the recipe helper run?”, “did the Convex mutation fail?”, “how many tokens did this profile context add?”, and “was the assistant response persisted before credits were consumed?”

The Engineering Thesis

Mimi is built as a constrained agent because Memoir is a high-trust product. The architecture combines a fast streaming user experience with durable Convex records, bounded tool execution, YAML-backed model and prompt policy, and LLM observability. The result is an AI system that can help families capture messy human material without turning the family archive into an unreviewed model output dump.

The important engineering decision is not “Memoir uses AI.” The important decision is that Mimi’s output moves through typed tools, server policy, Convex persistence, observability, and human-reviewable drafts before it becomes part of the archive.

References

Vercel AI SDK 6 ToolLoopAgent: https://ai-sdk.dev/docs/reference/ai-sdk-core/tool-loop-agent
Vercel AI SDK agents overview: https://sdk.vercel.ai/docs/agents/overview
OpenRouter provider for AI SDK: https://ai-sdk.dev/providers/community-providers/openrouter
OpenRouter package metadata: https://www.npmjs.com/package/@openrouter/ai-sdk-provider
Vercel AI SDK timeout guidance for Vercel: https://ai-sdk.dev/docs/troubleshooting/timeout-on-vercel
Vercel Fluid Compute duration defaults: https://vercel.com/docs/functions/configuring-functions/concurrency
PostHog LLM analytics: https://posthog.com/docs/llm-analytics
PostHog generations and traces: https://posthog.com/docs/llm-analytics/generations
Convex realtime docs: https://docs.convex.dev/realtime
Convex indexes and query performance: https://docs.convex.dev/database/reading-data/indexes/indexes-and-query-perf
Bun install docs: https://bun.com/docs/pm/cli/install
Bun product benchmarks: https://bun.com/
Bun vs Node.js install benchmark context: https://www.pkgpulse.com/blog/bun-vs-nodejs-npm-runtime-speed-2026
Next.js 16 release: https://nextjs.org/blog/next-16
Next.js 16.2 Turbopack notes: https://nextjs.org/blog/next-16-2-turbopack