Building the Context Platform: making my life machine-readable for agents | Lab Diary

So I have been building this thing I call the Life Command Center, a personal dashboard that pulls together to-dos, milestones, projects, reminders, AI agents, the whole deal. And at some point I realized the most important piece was not any of those features. It was the thing underneath them: context.

Not context in the vague "knowledge management" sense. I mean a structured virtualization of my life. Work projects, personal interests, my manga reading list, articles that caught my attention, research I want to reference later, anime I am tracking, technical notes from my homelab. All of it, organized in a way that something other than me can navigate.

Here is the thing that changed my thinking: I am probably not the primary consumer of this context. I rarely sit down and search through my own knowledge base manually. The agents do. An agent that needs to build a presentation pulls relevant articles from the Context Platform. An agent that manages my homelab checks what anime I flagged and starts the download. An agent drafting an email finds the meeting notes it needs without me pointing it there.

The Context Platform is not a search engine for me. It is structured terrain for AI agents to navigate. The fact that I can also search it is a nice side effect, but the design target is agentic consumption: making my personal context machine-readable, semantically rich, and reliably retrievable by autonomous processes.

This is the story of building that platform, a hybrid semantic and keyword search system powered by Voyage AI embeddings, PostgreSQL with pgvector, and a few ideas about intent-augmented retrieval that turned out to matter more than I expected.

The Architecture, or Why Everything Lives in PostgreSQL

Before getting into the implementation, it is worth understanding what "context" actually means here. The Context Platform stores everything across four domains: work (projects, meeting notes, client documents), personal (manga reading lists, anime watchlists, articles that caught my eye, personal goals), learning (research papers, technical deep dives, courses), and logs (agent outputs, system events, diary entries). It is not a note-taking app. It is closer to a structured mirror of the parts of my life that I want agents to be aware of.

The first decision was where to put the vectors. I looked at Pinecone, Weaviate, Qdrant, all the dedicated vector databases. They are good at what they do. But I already had Supabase PostgreSQL running for the rest of the app, and pgvector has gotten genuinely solid. Adding another infrastructure dependency for a single-user app felt like over-engineering.

I actually started with Pinecone. Got the free tier set up, wrote the integration, stored a few hundred vectors. It worked fine. But then I needed to join vector results with my file metadata in Postgres, and suddenly I was making two round trips for every search: one to Pinecone for nearest neighbors, one to Postgres for the actual file info. For a personal app this felt absurd. I ripped out the Pinecone integration after about three days and moved everything into pgvector. Losing the dedicated vector DB benchmarks was worth the simplicity of having everything in one place.

So everything lives in Postgres. Documents, chunks, embeddings, full-text search indexes, analysis metadata, all in the same database, queryable with the same Drizzle ORM I use everywhere else. And since agents hit the same database through the same API, there is no translation layer between "human search" and "agent search." It is the same interface.

The module itself lives across the monorepo:

packages/shared/src/context/: the canonical shared engine (chunking, classification, embeddings)
apps/web/src/lib/actions/context/: server actions for the Next.js app (store, search, CRUD)
apps/cli/commands/context.ts: a full CLI interface so I can pipe stuff in from the terminal
apps/web/src/app/api/context/: REST endpoints for agents, the browser extension, and other clients

That last bullet matters most. The REST API is how agents consume context. When an agent needs to find relevant documents (say, to build a presentation about a topic I have been reading about, or to check whether I have already saved a particular paper) it hits the same /api/context/search endpoint that the web UI uses. The shared package pattern was a decision I am glad I made early. The chunker, classifier, and embedding logic are identical whether content arrives through the web UI, the CLI, or a browser extension that saves articles I am reading. One source of truth, re-exported everywhere.

The Three-Route Classifier

Not all files are the same. A Markdown file is trivially readable. A PowerPoint deck needs extraction. A JPEG has no text to chunk at all.

My first attempt was a single pipeline for everything. That lasted about two days. I uploaded a .pptx file and the chunker tried to split raw XML binary gibberish into paragraphs. The embeddings it produced were, predictably, nonsense. Then I tried an image, which produced an empty string, which produced a zero vector, which then matched everything in search because a zero vector has the same cosine distance to everything. That was a fun bug to track down.

The fix was a format classifier that routes files into one of three processing paths:

// packages/shared/src/context/classifier.ts
export type FormatRoute = 'native' | 'beta' | 'metadata_only';

Native files are anything I can read as text right now: Markdown, plain text, CSV, JSON, Python, TypeScript, you name it. These go straight through the pipeline. Chunk, embed, done.

Beta files are Office formats (pptx, xlsx, docx) that need an AI agent to extract structured text first. These get queued in a beta_analysis_queue table and processed asynchronously. The agent extracts the text, then feeds it back into the native pipeline.

Metadata only covers images, PDFs, archives. Files I want to track and store but cannot meaningfully chunk yet. They get a database record and storage path, but no embeddings. The "yet" is doing heavy lifting there; multimodal embeddings are on the roadmap.

This three-route separation cleaned up the code considerably. Each route has its own responsibilities and failure modes, and when I eventually add PDF text extraction, it just moves from metadata_only to native. No pipeline surgery required.

Chunking: The Part That Sounds Simple Until You Try It

Chunking is one of those things that seems straightforward (just split the text into pieces, right?) until you realize that where you split determines whether your search results make sense or not.

I went through three iterations before landing on what I have now.

First attempt: fixed-size splits

Just cut every 1,600 characters. Fast, predictable, terrible results. Sentences got sliced in half. Paragraphs about one topic would end mid-thought and the next chunk would start with a dangling clause about something else. I had a chunk that started with "...which is why the attention mechanism fails" with no context about what "which" referred to. Completely useless for retrieval.

Second attempt: paragraph-only splits

Split on \n\n exclusively. Great for well-structured documents. Useless for code files, CSVs, or anything without double newlines. Some chunks were 50 characters, others were 8,000. I had one chunk that was literally an entire 6,000-word article because the author did not use blank lines between paragraphs.

What I landed on: recursive separator splitting

// packages/shared/src/context/chunker.ts
const DEFAULT_CHUNK_SIZE = 1600; // ~400 tokens
const DEFAULT_OVERLAP = 400;     // ~100 tokens
const SEPARATORS = ['\n\n', '\n', '. ', ' '];

The chunker tries the most semantically meaningful separator first (paragraph breaks), then falls back through line breaks, sentence boundaries, and finally word boundaries. It respects the target chunk size but prefers to split at a natural boundary even if the chunk is slightly under or over.

The overlap was the key insight I was missing in earlier iterations. Adjacent chunks share 400 characters of content (roughly 100 tokens) at their boundaries. This means that if a concept spans two chunks, at least part of it appears in both. Without overlap, I was getting search results that started mid-sentence because the relevant context was in the previous chunk. An agent would retrieve a chunk saying "this approach reduces latency by 40%" with zero explanation of what "this approach" was. Not helpful.

The numbers (1,600 characters, 400 overlap) came from experimentation more than theory. I changed the chunk size parameters probably eight times over two weeks. 800 characters, 1200, 2000, 3200, back to 1600. Each time I would re-embed a subset of documents, run some test queries, convince myself the new size was better, then find a case where it was worse. Too small and you lose the surrounding context. Too large and the embedding becomes a blurry average of too many concepts. I eventually stopped optimizing and accepted 1,600 as "good enough." The overlap matters more than the exact chunk size anyway.

Embeddings and the Intent Trick

For embeddings I went with Voyage AI's voyage-multimodal-3.5 model, 1,024-dimensional vectors with batch processing support:

// packages/shared/src/context/embeddings.ts
const VOYAGE_API_URL = 'https://api.voyageai.com/v1/multimodalembeddings';
const MODEL = 'voyage-multimodal-3.5';
const MAX_BATCH_SIZE = 128;

I actually started with OpenAI's text-embedding-3-small. It was the obvious default. But the retrieval quality for short queries against long documents was disappointing. I would search for "chunking strategies for RAG" and get back chunks about "retrieval pipelines" that were vaguely related but not what I wanted. Switching to Voyage improved things for two reasons. First, the multimodal angle: when I eventually add image context, I want embeddings that live in the same vector space as my text. Second, Voyage's retrieval-optimized models distinguish between input_type: 'document' (for indexing) and input_type: 'query' (for searching), which means the embeddings are asymmetric by design. Documents get embedded differently than queries, which produced noticeably better retrieval in my testing.

But the most interesting thing I did with embeddings was not the model choice. It was intent augmentation.

The intent prefix idea

Here is the problem: when you save a document, you usually have a reason. You are saving a research paper because you want to understand a technique. You are adding a manga to your reading list because someone recommended it. You are saving meeting notes because there are action items to follow up on. That purpose, the "why I saved this," is useful retrieval signal, but it gets lost the moment the document hits the database.

This matters even more when agents are the ones doing the retrieval. A human can scan a result and think "oh right, I saved this because of the section about KV caches, not the attention stuff." An agent cannot. It gets whatever the embedding space returns and has to work with that. So the more I can bake my original intent into the embeddings, the better the agents perform downstream.

So I added an intent generation step:

export async function generateIntentFromContent(
  content: string,
  title: string
): Promise<string | null> {
  // Uses first 1000 chars of content
  // Returns: "Learn about X to accomplish Y" style statement
}

A lightweight LLM call (GPT-4o-mini via OpenRouter) reads the first 1,000 characters and the title, then generates a 1-2 sentence intent statement. Something like: "Understand recursive chunking strategies to improve retrieval quality in a personal knowledge base."

The first version of this generated terrible intents. Really generic stuff like "This document contains information about technology" for everything. The problem was I was not giving the LLM enough context, just the title and first 200 characters. Bumping to 1,000 characters and adding the title as a separate field fixed most of it, though it still occasionally produces something useless for very short documents.

Then the intent gets prepended to every chunk before embedding:

const textsForEmbedding = chunks.map((c) =>
  effectiveIntent
    ? `[Intent: ${effectiveIntent}]\n\n${c.content}`
    : c.content
);
const embeddings = await generateEmbeddings(textsForEmbedding);

This biases the embedding toward the retrieval purpose. When I later search for "how to chunk text for RAG," the intent-prefixed embeddings of that chunking paper surface higher because they were embedded with the context of "learning about chunking strategies" rather than just the raw text about separator hierarchies.

I was skeptical this would make a meaningful difference. It does. Especially for documents where the content is tangential to my actual reason for saving them, like a paper about attention mechanisms that I saved specifically for its brief section on KV cache compression.

The intent can also be provided manually at upload time, which is useful for the CLI: lcc context store --intent "Reference for the agentic architecture article" paper.md.

The Ingestion Pipeline

All of this comes together in the ingestion pipeline:

// apps/web/src/lib/actions/context/store.ts
export async function processTextContent(
  fileId: string,
  text: string,
  title: string,
  domain: string,
  intent?: string | null
): Promise<void> {
  // Auto-generate intent if not provided
  let effectiveIntent = intent || null;
  if (!effectiveIntent) {
    effectiveIntent = await generateIntentFromContent(text, title)
      .catch(() => null);
  }

  // Chunk the text
  const chunks = chunkText(text);

  // Embed with intent prefix
  const textsForEmbedding = chunks.map((c) =>
    effectiveIntent
      ? `[Intent: ${effectiveIntent}]\n\n${c.content}`
      : c.content
  );
  const embeddings = await generateEmbeddings(textsForEmbedding);

  // Store chunks + embeddings in pgvector
  // Update file status to 'ready'
  // ...

  // Fire-and-forget LLM analysis
  analyzeContextFile(fileId, text, title, domain, effectiveIntent)
    .catch(console.error);
}

The status progression is: pending → chunking → embedding → ready. Each step updates the database, so the UI can show real-time progress. If anything fails, the status goes to failed with an error message, and the file can be retried.

One design decision I am particularly happy with: the LLM analysis is fire-and-forget. After the file reaches ready status (chunks stored, embeddings indexed, file is searchable), a separate analysis call runs asynchronously:

// apps/web/src/lib/context/analysis.ts
const MODEL = 'openai/gpt-4o-mini';
// Generates: summary, key_topics (3-8), entities,
// content_classification, language, action_items,
// complexity_level, word_count, reading_time_minutes

This was not always fire-and-forget, though. My first implementation had an await on the analysis call inside the main pipeline, which meant that if the LLM was slow (or if OpenRouter was having a bad day), uploads would hang for 30+ seconds. I (the only user) would think the upload had failed, retry, and create duplicate entries that bypassed the dedup check because the first upload had not committed its hash yet. Two bugs, one incident, zero trust in the upload system for about a week. The fix was making analysis genuinely async (.catch(console.error) instead of await) and moving the content hash check to before the Supabase Storage upload rather than after.

Now if the analysis fails (LLM timeout, rate limit, whatever) the file is still fully functional. You can search it, retrieve it, read it. The analysis is enrichment, not a gate.

But when the analysis does succeed, it becomes a major asset for agents. An agent deciding whether a document is relevant does not need to read the whole thing. It can check the summary, scan the key topics, look at the complexity level. The structured analysis metadata is essentially a machine-readable card catalog that agents can filter and reason over before committing to a full retrieval. My presentation-building agent, for instance, uses key_topics and content_classification to narrow candidates before it even runs a vector search.

Content-addressable deduplication was another early addition. Every file gets a SHA-256 hash of its content before storage. Upload the same document twice? The second upload gets rejected before it hits Supabase Storage. It has saved me from my own carelessness more times than I would like to admit.

Hybrid Search: Because Neither Approach is Good Enough Alone

This was probably the section of the build where I learned the most. I started with pure vector search: embed the query, find the nearest neighbors, done. It works well for semantic queries ("articles about organizational transformation") but fails embarrassingly for exact matches. Search for "RRF algorithm" and the vector search returns chunks about "search result merging" and "ranking fusion techniques," semantically related, sure, but missing the exact document where I have notes specifically about RRF.

So I added keyword search. PostgreSQL's built-in full-text search with tsvector and tsquery is surprisingly capable, and since my data is already in Postgres, it was essentially free.

The question then becomes: how do you merge two ranked lists that use completely different scoring systems?

My first attempt was embarrassingly naive. I tried normalizing both scores to a 0-1 range and averaging them. This does not work because the score distributions are completely different. Vector similarity scores tend to cluster between 0.7 and 0.9, while BM25 scores can be anything from 0.001 to 50. "Normalizing" them just meant the keyword scores dominated everything. I spent a whole afternoon debugging why my semantic search had stopped working before I realized the normalization was the problem.

Reciprocal Rank Fusion

Then I found RRF, a rank aggregation technique that sidesteps the whole normalization problem. Instead of trying to compare the scores (which are not comparable), you just use the rank position:

// apps/web/src/lib/actions/context/rrf.ts
export function mergeWithRRF(
  vectorRows,
  keywordRows,
  vectorWeight,
  keywordWeight,
  limit,
  K = 60
) {
  const scores = new Map();

  for (let i = 0; i < vectorRows.length; i++) {
    const rrfScore = vectorWeight / (K + i + 1);
    // accumulate by chunk_id...
  }

  for (let i = 0; i < keywordRows.length; i++) {
    const rrfScore = keywordWeight / (K + i + 1);
    // accumulate by chunk_id...
  }

  return Array.from(scores.values())
    .sort((a, b) => b.score - a.score)
    .slice(0, limit);
}

The K parameter (default 60) controls how much the rank position matters. Higher K means less difference between rank 1 and rank 10. The weights (default 60% vector, 40% keyword) let me bias toward semantic or exact matching depending on the query.

The search itself runs both retrievals in parallel against PostgreSQL stored functions:

CREATE FUNCTION context_vector_search(
  p_user_id UUID,
  p_embedding vector(1024),
  p_limit INT,
  p_domain context_domain
)
RETURNS TABLE(
  chunk_id UUID, file_id UUID,
  content TEXT, similarity DOUBLE PRECISION
)
AS $$
  SELECT cc.id, cc.file_id, cc.content,
    (1 - (cc.embedding <=> p_embedding))::DOUBLE PRECISION
      AS similarity
  FROM context_chunks cc
  JOIN context_files cf ON cf.id = cc.file_id
  WHERE cf.user_id = p_user_id
    AND cf.processing_status = 'ready'
    AND cc.embedding IS NOT NULL
    AND (p_domain IS NULL OR cf.domain = p_domain)
  ORDER BY cc.embedding <=> p_embedding
  LIMIT p_limit;
$$;

CREATE FUNCTION context_keyword_search(
  p_user_id UUID,
  p_query TEXT,
  p_limit INT,
  p_domain context_domain
)
RETURNS TABLE(
  chunk_id UUID, file_id UUID,
  content TEXT, rank DOUBLE PRECISION
)
AS $$
  SELECT cc.id, cc.file_id, cc.content,
    ts_rank_cd(
      to_tsvector('english', cc.content),
      plainto_tsquery('english', p_query)
    )::DOUBLE PRECISION
  FROM context_chunks cc
  JOIN context_files cf ON cf.id = cc.file_id
  WHERE cf.user_id = p_user_id
    AND cf.processing_status = 'ready'
    AND (p_domain IS NULL OR cf.domain = p_domain)
    AND to_tsvector('english', cc.content)
      @@ plainto_tsquery('english', p_query)
  ORDER BY rank DESC
  LIMIT p_limit;
$$;

Pushing the search into PostgreSQL functions was a performance decision. The alternative (pulling all chunks to the application layer and filtering there) would have been a disaster at scale. With the functions, the database handles the vector distance calculation and text matching in a single round trip. Each search fires both functions in parallel via Promise.all, then merges with RRF:

const [vectorResults, keywordResults] = await Promise.all([
  db.execute(sql`SELECT * FROM context_vector_search(
    ${userId}, ${embedding}, ${limit * 2}, ${domain}
  )`),
  db.execute(sql`SELECT * FROM context_keyword_search(
    ${userId}, ${query}, ${limit * 2}, ${domain}
  )`),
]);

const ranked = mergeWithRRF(
  vectorRows, keywordRows,
  vectorWeight, keywordWeight, limit
);

Notice the limit * 2. I over-fetch from each source so RRF has enough candidates to work with. If I asked for 10 results from each and there were 5 overlapping chunks, I would end up with only 15 unique candidates to rank. Over-fetching by 2x gives RRF breathing room.

The 60/40 split between vector and keyword was not scientific. I tried 50/50, 70/30, 80/20. For my content (mostly technical prose) 60% vector gave better results for conceptual queries while 40% keyword kept exact-match queries honest. Your mileage will vary depending on content type.

A Detour: Visualizing the Vector Space

Somewhere around the second week I got curious about what my embedding space actually looked like. Were similar documents actually near each other? Were there clusters? Was the intent augmentation moving things around in meaningful ways?

So I added a UMAP projection endpoint:

// apps/web/src/lib/actions/context/projection.ts
const UMAP = (await import('umap-js')).UMAP;
const umap = new UMAP({
  nComponents: 2,
  nNeighbors: Math.min(15, rows.length - 1),
  minDist: 0.1,
});
projected = umap.fit(embeddings);

UMAP takes the 1,024-dimensional embeddings and projects them down to 2D while trying to preserve local neighborhood structure. The result is an interactive scatter plot in the UI, color-coded by bucket or domain, where each dot is a chunk and you can hover to see its content.

It is not exactly a production feature. More of a debugging tool that turned out to be genuinely useful for understanding my knowledge base. I can visually see clusters forming around topics (all my context management articles cluster together, deployment notes form their own island), and when something is in an unexpected location, it usually means the chunking split a document poorly or the intent was misleading.

The projection is cached in memory for 5 minutes because UMAP on a few thousand vectors takes a couple seconds. Not a disaster, but not something I want running on every page load.

Designing for Agents, Not Just for Me

So here is where things get philosophically interesting. Most RAG tutorials optimize for a human typing a query and reading the results. That is a fine starting point, but it is not really what I am building for.

My actual use pattern looks more like this: I save an article about, say, a new chunking technique. A week later, I ask an agent to draft a blog post about retrieval systems. That agent queries the Context Platform, gets back relevant chunks, and weaves them into a draft. I never manually searched for anything. Or I add a new anime to my personal watchlist context. A separate agent that manages my homelab media server picks it up on its next run, checks availability, and starts the download. I did not search. I did not browse. I just added context, and an agent acted on it.

This changes what "good retrieval" means. For a human, returning the top 5 most relevant chunks is usually enough. You read them and figure it out. For an agent, the context needs to be more self-describing. The agent needs to know not just that a chunk is relevant, but what kind of document it came from, what domain it belongs to, what the original intent was, and how it relates to other stored context.

That is why the LLM analysis metadata matters so much. The key_topics, entities, content_classification, and complexity_level fields are not there for me to browse. They are structured signals that agents can filter on programmatically. An agent building a technical presentation can filter for complexity_level: 'advanced' and domain: 'work' before running a semantic search, which cuts the result space down to something manageable.

The domain system (work, personal, learning, logs) was originally just an organizational convenience. But once agents started consuming context, it became a permission boundary. My homelab agent only queries personal domain. My work agents only query work and learning. The domain is not just a label. It scopes what each agent can see.

I keep thinking about this as building a structured map of my life that agents can walk through. The richer the terrain (intents, topics, entities, domains, relationships), the more useful things agents can do without me having to hold their hand through every step.

The Database Schema Under the Hood

Five core tables power the context system, plus three supporting ones:

context_files is the master record. Each document gets one row with its title, domain (work, personal, learning, logs), file type, tags as JSONB, a SHA-256 content hash for dedup, the Supabase Storage path, processing status, chunk count, token count, and the optional intent statement.

context_chunks holds the actual pieces. Each chunk has its content, index within the file, token count, and a vector(1024) column for the embedding. This is where pgvector does its work.

context_file_analysis stores the LLM-generated enrichment: summary, key topics, entities, classification, complexity level, and computed stats like word count and reading time. One row per file, joined by file ID.

context_interactions is an audit log of every search query. What was searched, how many results, which domain, top file IDs returned, search type, and duration in milliseconds. I use this to understand agent search patterns and tune the retrieval.

beta_analysis_queue manages the async processing of Office files: status, priority, attempt count, max retries, and error messages.

Then there are context_buckets for grouping related files (like a folder, but semantic), bucket_notes for attaching commentary to buckets, and kapa_reports, which is a whole separate rabbit hole about generating McKinsey-style analysis reports from bucket contents using an AI agent. That one deserves its own diary entry.

The Embedding Dimension Incident

Voyage AI's docs were not entirely clear about whether voyage-multimodal-3.5 returns 1024 or 1536 dimensions when I first started. I assumed 1536 because that is what OpenAI's models use, and I set up the pgvector column as vector(1536). Stored about 200 documents. Everything seemed to work.

Except the search results were... off. Not completely wrong, but weirdly imprecise. Queries that should have been easy misses were returning unrelated chunks. I spent two days checking my chunking logic, my query construction, my RRF weights. Everything looked fine.

Then I logged the actual embedding vectors coming back from Voyage and saw they were 1024 dimensions. Pgvector was zero-padding them to 1536. Every search "worked" technically, but the distance calculations were garbage because 33% of every vector was meaningless zeros. The cosine distance between any two vectors was artificially compressed because they all shared 512 dimensions of identical zeros.

I had to drop the column, recreate it as vector(1024), and re-embed everything. Two days of debugging for what was essentially a wrong number in a schema file. Lesson: always check the actual output dimensions, even if you think you know.

What I Would Do Differently

If I were starting from scratch, a few things I would change:

Start with hybrid search from day one. I wasted time building and testing a pure vector search system, then the naive score normalization, before discovering RRF. The hybrid approach with RRF is strictly better. Just build both retrieval paths from the start and merge them.

Define the chunk overlap before choosing the chunk size. I spent too long tuning the chunk size when the overlap was the more important parameter. Once I got the overlap right (25% of chunk size), the exact chunk size mattered less.

Add interaction logging earlier. The context_interactions table was a late addition, and I wish I had been logging agent queries from day one. Understanding how agents actually navigate the context (which queries return good results, which return nothing, how long searches take) is the best signal for tuning the system. It also tells me which agents are heavy context consumers and which barely touch the platform.

Design the domain model for agents first, not for human organization. I organized domains (work, personal, learning, logs) the way I think about my life. But agents do not think in those buckets. An agent building a presentation cares about "content related to this topic" across all domains. An agent managing my homelab cares about "media I want to consume" regardless of whether it is anime (personal) or a conference talk (learning). The domain boundaries that make sense for human browsing are not the same as what makes sense for agentic scoping. I am still sorting this one out.

What is Next

The Context Platform is functional. Agents are consuming it daily, and I occasionally search it myself. But there is one thing I am honestly really excited to build next, and it is not a small feature. It is the thing that might change how the whole platform works.

Self-organizing domains with auto-discovery. The four fixed domains (work, personal, learning, logs) were a starting point, but they are already too rigid. My anime watchlist and my books reading list both sit under "personal" even though they serve completely different agents and have nothing to do with each other. My homelab notes are "personal" but they are closer to "learning" in how agents use them. I keep having to think about where things go, and that is exactly the opposite of what I want.

The whole point of this platform is that it should organize itself. I should be able to throw context at it (an article, a manga title, a meeting note, a research paper) and the system should figure out how to categorize it, how to make it discoverable, which agents should care about it. If I am spending time manually thinking about domain boundaries, I have already failed at the design goal. I want domains (or whatever replaces them) to emerge from the content itself and from how agents actually use it. Maybe through clustering. Maybe through agent usage patterns. Maybe through some combination of both. I do not know the exact mechanism yet, but this is the feature I am most looking forward to building. It turns the Context Platform from something I have to maintain into something that maintains itself.

The rest of the roadmap is smaller improvements. Cross-file relationship graphs would help agents discover related documents without me connecting the dots manually. Temporal decay for relevance scoring would help distinguish between context that is still current (my manga reading list) and context that has gone stale (meeting notes from three months ago). Both worth doing, but incremental.

And the kapa reports, that whole concept of "select a bucket of related documents and generate a structured analysis report," is working in prototype but the prompt engineering is... a journey. More on that in a future entry.

The bigger picture is that I am building toward something where I mostly just save context (articles, lists, notes, whatever catches my attention) and the agents figure out what to do with it. The less I have to explicitly orchestrate, the closer I get to a system that actually runs alongside my life rather than requiring me to manage it. Self-organizing domains are the piece that would get me closest to that.