An AI Product Manager’s perspective on knowledge retrieval, agent systems, and why the best explanation of modern AI architecture is an Apple TV+ show about aliens.

I recently binged Pluribus, Vince Gilligan’s new sci-fi series on Apple TV+. For those who haven’t seen it: an alien virus transforms humanity into a peaceful, content hive mind. Everyone’s consciousness merges into one. No secrets, no lies, instant access to all human knowledge.
Halfway through episode 2, I paused and texted my friend: “Bro, this is just RAG with perfect retrieval and 8 billion embodied agents.”
He thought I was joking. I wasn’t.
As someone who’s spent years in product and technology, I’ve read countless technical explanations of Retrieval-Augmented Generation, vector databases, and agent architectures. None of them clicked like watching Carol Sturka navigate a world where everyone except her shares one mind.
Let me explain.
What is RAG, Really?
Retrieval-Augmented Generation sounds complicated, but the concept is simple: instead of expecting an AI to know everything, you give it access to a knowledge base it can search through when answering questions.
Think of it like this:
- Without RAG: Asking someone a question, and they can only use what’s in their head
- With RAG: Asking someone a question and they can instantly search through every book, document, and conversation ever recorded
The “retrieval” part finds relevant information. The “generation” part synthesizes it into an answer.
Now here’s where Pluribus becomes the perfect metaphor.
The Pluribus as a RAG System
In the show, the hive mind operates exactly like an idealized RAG architecture:
Knowledge Store
Every human memory, skill, and piece of knowledge is accessible to the collective. Someone in Tokyo knows how to perform surgery? That knowledge is instantly available to someone in Buenos Aires. It’s not stored in one place; it’s distributed across billions of nodes (humans), but retrievable as if it were one unified database.
Zero Latency Retrieval
There’s no “searching.” No embeddings, no similarity scores, no top-k results. The Pluribus just… knows. When information is needed, it surfaces instantly through the collective consciousness.
Embodied Agents
This is where Pluribus goes beyond traditional RAG. Most RAG systems retrieve information. The Pluribus can act on it. Need something built in Shenzhen? A body is already there. Need a message delivered across the world? No API calls, a human walks there. Eight billion potential agents, zero coordination overhead.
The Interface Layer (Zosia)
Carol is immune to the virus. She can’t access the hive mind directly. So she gets Zosia, a liaison from the Pluribus who can translate between Carol’s individual consciousness and the collective.
Zosia is essentially a chat interface to humanity’s RAG system. She retrieves information from the hive and presents it in a format Carol can understand. She’s ChatGPT, but the knowledge base is every living human’s memories.
The Architecture Problem Pluribus Solves
Here’s what struck me from a systems perspective: the Pluribus has solved every major challenge in AI knowledge systems.
| Challenge | Traditional AI | The Pluribus |
|---|---|---|
| Knowledge cutoff | Training data has an end date | Real-time updates as humans experience things |
| Hallucination | Models confidently make things up | Direct access to ground truth memories |
| Context limits | Fixed token windows | No limits — entire collective accessible |
| Coordination | Complex multi-agent orchestration | Single unified consciousness |
| Execution | AI can suggest, but humans must act | Seamless thought-to-action |
It’s the system every AI company is trying to build: unlimited knowledge, perfect retrieval, seamless execution.
But there’s a cost.
The Carol Problem: Why You Need the Immune
Here’s the twist that makes Pluribus brilliant storytelling and a perfect product lesson: the hive mind needs Carol.
Why? Because unanimous consensus kills innovation.
The Pluribus has access to all human knowledge, but everyone thinks the same way. There’s no friction, no debate, no “what if we tried something completely different?” The immune individuals — the Carols of the world — are the only ones who can generate truly novel ideas.
In product terms: you can have perfect retrieval or original thinking, but not both.
This maps directly to how we should think about AI systems:
- RAG gives you access to existing knowledge — what’s already been written, discovered, documented
- Human creativity generates new knowledge — connections that don’t exist yet, questions no one thought to ask
The most powerful system isn’t pure RAG. It’s a human with original thinking + RAG as an augmentation layer. You stay “immune” (keep your individual perspective) while having an interface (your Zosia) to retrieve collective knowledge when needed.
Enter LEANN: The Storage Problem
Now let’s get practical. Say you want to build your own personal Pluribus, a RAG system over all your emails, documents, chat history, and browser history. Your own private hive mind.
Traditional vector databases have a problem: they store embeddings for every chunk of text. For 60 million text chunks, that’s 201GB of storage. Your laptop can’t handle that.
This is where LEANN caught my attention. It’s a new approach that achieves 97% storage savings through a clever trick: instead of storing all embeddings, it stores a pruned graph structure and recomputes embeddings on demand during search.
| System | 60M Documents | Storage Savings |
|---|---|---|
| Traditional Vector DB | 201 GB | — |
| LEANN | 6 GB | 97% |
The Pluribus metaphor holds here, too:
- Traditional RAG = storing everyone’s memories in a central vault
- LEANN = keeping a map of who knows what, and asking them directly when needed
LEANN uses “high-degree preserving pruning,” keeping important hub nodes (highly connected pieces of knowledge) while removing redundant connections. In Pluribus terms, it’s like maintaining the critical neural pathways that connect the most minds while letting the less important ones be reconstructed when needed.
The Product Manager’s Take
If I were building the Pluribus as a product, here’s my PRD:
Vision: Universal knowledge access with seamless execution
Users:
- The Connected (8B) — have full access, part of the collective
- The Immune (13) — need interface layer, retain individual agency
Core Features:
- Zero-latency knowledge retrieval
- Distributed agent execution
- Real-time knowledge updates
- Liaison interface for non-connected users
Critical Risk: The immune users’ emotions can crash the system. Carol’s anger killed 11 million people. This is your biggest edge case — power users who can bring down production.
Mitigation: Assimilate them? Isolate them? Or… recognize that they’re your source of innovation and build containment around the risk while preserving their unique value.
What This Means for AI Products
Pluribus isn’t just good sci-fi. It’s a thought experiment about where AI systems are heading:
- RAG is necessary but not sufficient — retrieval without execution is just a better search engine
- Agent architectures matter — the real value unlock is when AI can act, not just retrieve
- Individual thinking remains valuable — even in a world of perfect knowledge access, original thought is the scarce resource
- Storage efficiency will be crucial — as we try to RAG over more personal data, solutions like LEANN’s graph-based approach become essential
- The interface layer is everything — Zosia makes the Pluribus usable for Carol. Your AI product’s UX determines whether users can actually leverage the underlying capabilities.
My Dream Setup
After watching Pluribus, I know exactly what I want:
- Stay immune — keep my curiosity, creativity, and ability to question assumptions
- Have my Zosia — an interface to retrieve any knowledge I need
- Access the agents — ability to execute through distributed systems when needed
That’s not science fiction anymore. That’s a local LLM + LEANN + MCP servers for tool execution. We’re building toward the Pluribus architecture, just without the alien virus and loss of individuality.
The hive mind is just RAG. The question is: do you want to be in the hive, or do you want to be Carol with a really good interface?
I choose Carol. Every time.
Pluribus is streaming on Apple TV+. LEANN is open source at github.com/yichuan-w/LEANN. Both are worth your time.