Artificial Intelligence

AI Persona Clone — Voice + RAG over Your Own Documents

A voice-interactive AI that answers as you, grounded in your Google Drive and local files

Next.jsGemini 1.5QdrantNextAuthWeb Speech APIDocker

3 (Drive, SharePoint, local)

Knowledge sources

1024-d hybrid

Vector dimensionality

Gemini → self-hosted Ollama

LLM

Full duplex (Web Speech API)

Voice

Dense + sparse + re-rank

Retrieval

Desktop-first, local LLM

Deployment

The ask was deceptively simple: an AI that responds in a specific person's voice, only using what that person actually knows. Off-the-shelf AI assistants couldn't do it — they hallucinate from their training data, they can't read your private files, and they don't listen.

I built a voice-interactive persona clone with dual knowledge ingestion (Google Drive + local files), grounded retrieval, and full-duplex voice. The project later evolved into a local-first enterprise RAG desktop app — same ideas, hardened for a company that couldn't send its documents to a third-party LLM.

The Problem

The client wanted an AI persona that could answer questions the way a specific expert would — accurate to what that expert had written and said, not to whatever a general LLM had read on the internet. Two things made off-the-shelf tools useless:

1. They couldn't combine a private document set (Google Drive) with local files into one knowledge base. 2. They couldn't talk back. Voice interaction was a hard requirement — typing wasn't an option.

Underneath both, the real problem was hallucination. A persona that invents answers in someone's voice is worse than no persona at all.

The Approach

Retrieval-Augmented Generation with the persona's actual documents as the only source of truth. The first version used Qdrant for vector search, Gemini 1.5 Pro for inference, and a dual ingestion pipeline — Google Drive via OAuth (read-only scope) and local files via the filesystem, both chunked and embedded into the same collection. Web Speech API handled voice in and high-quality TTS handled voice out.

The prompt was assembled in layers — agent persona, task role, retrieved context, active rules — never inlined. An anti-hallucination rule layer refused to answer when retrieval confidence was low instead of guessing.

The second version moved inference on-prem: Ollama (qwen2.5:7b) for chat, a BGE-M3 embedding server for hybrid dense+sparse vectors, a cross-encoder re-ranker (RRF fusion, top 50 → top 10), and Qdrant for storage. Authentication went from Google OAuth to OS-native (PAM on Linux, Win32 LogonUser on Windows) via a Tauri desktop shell, because the deployment was a desktop app, not a web site. SharePoint was added as a third source.

The Outcome

A voice-interactive persona that answers only from documents the owner has chosen, refuses when it isn't sure, and remembers across a conversation. The local-first rewrite runs entirely on the user's machine — no document leaves the laptop unless the user opts into a cloud source — which made it usable inside a company with a real data-residency constraint.

The interesting outcome wasn't the demo. It was watching the anti-hallucination rules carry most of the value: the persona that says "I don't have that" is the one people trust.

Key Takeaway

RAG with multi-source ingestion plus explicit refusal rules beats a fancier model every time. The model isn't the product — the guardrails are.

Build Something Like This

Ready to Build Your Platform?

One 30-minute call to see if we're a fit. No pitch. No pressure. Just a conversation about what you need to build.

Book a Free Strategy Call Send a Message

$20K–$25K for an MVP. $30K–$80K for a full platform. Fixed price, milestone-gated. 50% upfront.