Search

Keyword, semantic, and hybrid search across screen text, transcripts, UI snapshots, and document chunks. with filters, local ranking, and no external query leakage.

Last updated: 28 April 2026

What search covers

Overshow indexes four kinds of material so you can find moments across your captured history:

Content type	What is indexed	Typical use
OCR text	Words read from the screen during capture	UI copy, documents on screen, error messages
Audio transcriptions	On-device speech-to-text segments	Meetings, calls, dictation
UI accessibility snapshots	Structured text from accessibility trees	Controls, headings, and labels where AX is available
Document chunks	Passages from files in watched folders	PDFs and other indexed documents alongside captures

Search treats these as first-class sources in the same local index. Document chunks appear once you have added watched folders under Settings → Documents; document watching is enabled by default but indexes nothing until folders are configured. Filters such as content type let you narrow to combinations that match how you work (for example audio-only review after a call, or screen-plus-UI when you remember a dialogue box).

Every search runs entirely on your device. Query text is not sent to external search or embedding APIs. Ranking, filtering, and pagination all use your local database and models already bundled with Overshow. Local server logs can include search timing buckets and embedding corpus size for troubleshooting, but not raw query text.

Search modes in depth

You choose how the engine matches your query: keyword (full-text), semantic (meaning), or hybrid (auto), which combines both.

Keyword mode

Keyword search uses full-text indexing with relevance ranking over indexed text. It excels when you remember distinctive tokens: error codes, names, ticket IDs, or exact phrases. Stemming and full-text features behave like a traditional inverted index. fast, explainable, and strong for literal overlap.

Use keyword mode when:

You can quote or approximate the wording on screen or in speech.
You want predictable “find this string” behaviour.
Semantic paraphrase would add noise (for example searching for a UUID).

Semantic mode (natural / meaning)

Semantic search embeds your query using an on-device embedding model, producing vectors that represent meaning. Candidates are compared by similarity to surface passages that mean similar things even when wording differs. In the product, natural behaviour corresponds to this semantic-only path.

The implementation applies similarity thresholds to filter candidates efficiently. Defaults are tuned for recall first, and operators can tune the primary and fallback distance thresholds with environment variables during local diagnostics. Time bounds are applied early in the query plan, which keeps large histories responsive when you constrain dates.

Use semantic mode when:

You remember the idea (“deployment rollback discussion”) but not the exact phrase.
Vocabulary varied between speakers or apps.
Keyword queries return too few or too literal hits.

Hybrid mode (auto)

Hybrid (often labelled auto in the desktop UI) runs keyword and semantic retrieval concurrently, then merges lists using a rank fusion algorithm. When keyword results are sparse, the engine may run a broader text pass to improve recall before fusion.

After fusion, Overshow applies deterministic tiebreaks so equally fused ranks favour newer material, then a stable content ID. Hybrid responses can include pagination.next_cursor; clients pass that opaque token back as pagination.cursor to continue deep result pages without relying on offset arithmetic.

If semantic or hybrid retrieval is unavailable, returns no semantic candidates, or errors, the API includes a semantic_fallback reason while returning keyword results. Desktop clients can use that field to show a degraded-search notice instead of silently presenting keyword-only recall.

When hybrid is the sensible default

Choose hybrid when you are unsure whether keywords or meaning will win, or when queries mix proper nouns with conceptual language. Switch to a single-signal mode when you are debugging recall (semantic) or precision (keyword).

Search modes comparison

Aspect	Keyword	Semantic (natural)	Hybrid (auto)
Core signal	Full-text relevance ranking	Embedding similarity	Keyword + semantic in parallel; optional broader text pass if keyword is sparse
Fusion	N/A	N/A	Rank fusion + recency tiebreak
Best for	Codes, names, exact phrases	Ideas, paraphrase, vague recall	General exploration, mixed queries
Predictability	High (term overlap)	Moderate (embedding geometry)	Balanced blend

How results are ranked

Stage	Behaviour
Keyword lists	Relevance scores from full-text search
Semantic list	Similarity versus query embedding, thresholded for quality
Hybrid merge	Rank fusion combines results from concurrent retrievals
Tiebreak	Recency refines ordering when fusion scores are tied

If results look “too old,” tighten the time range or lookback filter before changing mode. recency helps, but explicit windows often matter more.

Semantic query caching

Repeated queries do not always recompute embeddings. Overshow caches recent query embeddings, which reduces latency when you refine filters or paginate without changing the query string.

Filters reference

Filter	Description	Example mental model
Start / end time	Absolute window on capture time	“Between stand-up and lunch”
Lookback days	Relative lookback from now	“Last seven days only”
App name	Limit to a source application	“Only Slack” or “Only IDE”
Window name	Narrow by window title where available	A specific document or browser tab title
Speaker	Filter transcript segments by diarised speaker	Requires speaker identification to be meaningful
Content type	Restrict to all, screen text, audio, UI, or combinations	Mix and match capture channels
Meeting	Scope to a linked meeting session	Post-call review for one calendar event
Browser URL	Filter by page URL when captured	Finding a specific site or path
Length range	Drop very short or very long snippets	Reduce noise or focus on substantive passages

Speaker filter prerequisites

The speaker filter applies to transcribed audio that has been through speaker identification. If speakers are not labelled in your data, this filter will not magically infer people. enable and process speaker identification first, then filter.

Pagination and stats

Results are paged so the UI stays responsive on large histories.

Result anatomy

Each result row (or API row) is designed so you can triage quickly:

Field	Purpose
Source app	Which application produced the capture
Timestamp	When the moment occurred (`captured_at`)
Matched text	The snippet or segment that best aligns with the query
Content type	Whether the hit came from OCR, audio, UI snapshot, document chunk, or a combined view

Opening a row typically reveals fuller context and paths into Sources for deeper review.

Desktop UI

In the desktop search and recall surfaces you will find:

A query field for terms or natural language, depending on mode.
Compact source rows summarising app, time, matched text, and content type.
Inline citations in Ask when synthesis is more useful than a raw list.
The Sources inspector for full stored text and provenance.
Timeline for chronological review around a captured moment.

See also the desktop doc on Ask and Sources for control placement in your build.

Tips for effective queries

Start short, then add distinctive tokens if recall is high.
Set time first on busy machines; it is the cheapest filter mentally and computationally.
Match mode to memory: keywords for codes and names, semantic for concepts, hybrid when unsure.
Use content_type when you know the channel (for example “this was definitely said, not shown”).
Combine app + window when many tabs share one browser.

Combining filters for precision

Goal	Suggested combination
“That Slack thread last Tuesday”	App name + time range + keyword or hybrid
“What Alice said in the QBR”	Meeting or time range + speaker (if available) + semantic
“The legal PDF, not the email”	Content type toward documents + path or time if known
“Chrome page about billing”	Browser URL fragment + hybrid query

Example searches

Scenario	Mode	Query / filters (illustrative)
Incident post-mortem	Keyword	Error string + `history_days: 1` + relevant `app_name`
Product decision recall	Hybrid	Short conceptual query + meeting or date window
Accessibility audit trail	Keyword or hybrid	Control label + `content_type` including `ui`
Long-form doc recall	Semantic	Paraphrase + document-friendly `content_type`

Search finds what was captured. If capture was paused, excluded by policy, or outside retention, it will not appear. absence of results is not proof something never happened elsewhere.