Skip to content

Comparison

Local vs cloud AI assistants: what leaves the laptop and what doesn't

Where captured screens, audio, and transcripts live, which models run on your laptop, and what actually leaves the device.

Data boundaryOn-device modelsMCP and exportsSecurity review

What is different in practice

Three concrete differences between local and cloud AI assistants

Where raw data lives

With Overshow, OCR text, audio transcripts, and metadata sit in a local SQLCipher-encrypted SQLite database on the laptop. No screen images or video files are persisted. Cloud assistants upload the same content to their provider by default.

Which models do the work

Overshow runs FluidAudio Parakeet TDT v3 for transcription, FluidAudio Sortformer fastV2_1 for speaker diarisation, EmbeddingGemma 300M for embeddings, and Gemma 4 E2B via MLX Swift for chat. All on-device. Cloud assistants call an external API for each of those.

What leaves the device

By default, nothing does. The optional MCP server exposes approved tools to AI clients you choose, and only while the app is running. Most tools retrieve local context; a few update local metadata. Exports are manual.

What runs on-device in Overshow today

FluidAudio Parakeet TDT v3
Transcription

Fixed 20-30 second capture windows with a 1-2 second overlap tail.

EmbeddingGemma 300M
Semantic search

768-dimensional vectors indexed locally alongside FTS5.

Gemma 4 E2B
Chat and questions

Runs via MLX Swift on Apple Silicon. No external LLM calls.

Compare it against your current assistant

Install Overshow on one laptop, use it for a week on your real work, and see what you miss from a cloud assistant and what you gain from keeping data local.

Comparison guide

The concrete differences that change the security review

Local vs cloud AI assistants

This is a practical comparison, not an ideological one. Both models work. They differ in where captured data sits, which AI runs where, and what you need to explain to your security team.

What runs where in Overshow

Capture, transcription, search, and chat all run on the laptop.

  • Screen capture: hybrid event-driven on macOS, with a 0.5 FPS floor. Apple Vision does the OCR.
  • Audio: microphone and system audio, transcribed on-device with FluidAudio Parakeet TDT v3 in fixed 20-30 second windows.
  • Speaker identification: FluidAudio Sortformer fastV2_1 produces 4-speaker streaming diarisation per meeting; profiles are linked locally.
  • Search: SQLite FTS5 for full text, plus EmbeddingGemma 300M for semantic search.
  • Chat and Ask: Gemma 4 E2B via MLX Swift on Apple Silicon.

Nothing in that list requires a network call to process your captured content.

What leaves the device, and when

The desktop app is local. The account does sit in the cloud, because billing, device registration, and SSO need to. What stays on the machine is the captured content itself.

Optional paths where data does leave:

  • MCP server: exposes approved tools to AI clients you explicitly configure (Claude Desktop, Cursor, Jan, LM Studio, Ollama). Most tools are retrieval-only; commitment and meeting-classification tools can update local metadata. Requires the app to be running, and you approve what connects.
  • Exports: you run them manually when you want to.
  • Cloud assistant integrations (if you add them): require explicit consent and default to local-first.

Where cloud-first is a better fit

Cloud-first assistants are usually the right choice when:

  • You want the most capable frontier models available without buying Apple Silicon.
  • You do not capture screens or audio, only chat transcripts.
  • Your security posture already permits sending work content to the provider.

Where local-first is a better fit

Local-first is usually the right choice when:

  • You record screens or meeting audio and would rather not upload that material to a third party.
  • You work across clients, vendors, or regulated environments and need the boundary to be the laptop.
  • You want semantic search over your own history without an external index service.

Questions to ask before deciding

  • Does the assistant capture your screen or audio? If yes, where is it stored?
  • Can the assistant run transcription and embeddings without a cloud call?
  • What does exporting or deleting everything look like?
  • What does a security reviewer have to approve: an API or a database file?