Search
Keyword, semantic, and hybrid search across screen text, transcripts, UI snapshots, and document chunks. with filters, local ranking, and no external query leakage.
Last updated: 2 April 2026
What search covers
Overshow indexes four kinds of material so you can find moments across your captured history:
| Content type | What is indexed | Typical use |
|---|---|---|
| OCR text | Words read from the screen during capture | UI copy, documents on screen, error messages |
| Audio transcriptions | On-device speech-to-text segments | Meetings, calls, dictation |
| UI accessibility snapshots | Structured text from accessibility trees | Controls, headings, and labels where AX is available |
| Document chunks | Passages from files in watched folders | PDFs and other indexed documents alongside captures |
Search treats these as first-class sources in the same local index. Filters such as content type let you narrow to combinations that match how you work (for example audio-only review after a call, or screen-plus-UI when you remember a dialogue box).
Every search runs entirely on your device. Query text is not sent to external search or embedding APIs. Ranking, filtering, and pagination all use your local database and models already bundled with Overshow.
Search modes in depth
You choose how the engine matches your query: keyword (full-text), semantic (meaning), or hybrid (auto), which combines both.
Keyword mode
Keyword search uses full-text indexing with relevance ranking over indexed text. It excels when you remember distinctive tokens: error codes, names, ticket IDs, or exact phrases. Stemming and full-text features behave like a traditional inverted index. fast, explainable, and strong for literal overlap.
Use keyword mode when:
- You can quote or approximate the wording on screen or in speech.
- You want predictable “find this string” behaviour.
- Semantic paraphrase would add noise (for example searching for a UUID).
Semantic mode (natural / meaning)
Semantic search embeds your query using an on-device embedding model, producing vectors that represent meaning. Candidates are compared by similarity to surface passages that mean similar things even when wording differs. In the product, natural behaviour corresponds to this semantic-only path.
The implementation applies similarity thresholds to filter candidates efficiently. Time bounds are applied early in the query plan, which keeps large histories responsive when you constrain dates.
Use semantic mode when:
- You remember the idea (“deployment rollback discussion”) but not the exact phrase.
- Vocabulary varied between speakers or apps.
- Keyword queries return too few or too literal hits.
Hybrid mode (auto)
Hybrid (often labelled auto in the desktop UI) runs keyword and semantic retrieval concurrently, then merges lists using a rank fusion algorithm. When keyword results are sparse, the engine may run a broader text pass to improve recall before fusion.
After fusion, Overshow applies a recency tiebreak so equally fused ranks favour newer material when appropriate. This gives day-to-day queries a balanced feel: literal hits stay visible, paraphrase matches can appear, and the list does not feel arbitrarily frozen in old history.
When hybrid is the sensible default
Choose hybrid when you are unsure whether keywords or meaning will win, or when queries mix proper nouns with conceptual language. Switch to a single-signal mode when you are debugging recall (semantic) or precision (keyword).
Search modes comparison
| Aspect | Keyword | Semantic (natural) | Hybrid (auto) |
|---|---|---|---|
| Core signal | Full-text relevance ranking | Embedding similarity | Keyword + semantic in parallel; optional broader text pass if keyword is sparse |
| Fusion | N/A | N/A | Rank fusion + recency tiebreak |
| Best for | Codes, names, exact phrases | Ideas, paraphrase, vague recall | General exploration, mixed queries |
| Predictability | High (term overlap) | Moderate (embedding geometry) | Balanced blend |
How results are ranked
| Stage | Behaviour |
|---|---|
| Keyword lists | Relevance scores from full-text search |
| Semantic list | Similarity versus query embedding, thresholded for quality |
| Hybrid merge | Rank fusion combines results from concurrent retrievals |
| Tiebreak | Recency refines ordering when fusion scores are tied |
If results look “too old,” tighten the time range or lookback filter before changing mode. recency helps, but explicit windows often matter more.
Semantic query caching
Repeated queries do not always recompute embeddings. Overshow caches recent query embeddings, which reduces latency when you refine filters or paginate without changing the query string.
Filters reference
| Filter | Description | Example mental model |
|---|---|---|
| Start / end time | Absolute window on capture time | “Between stand-up and lunch” |
| Lookback days | Relative lookback from now | “Last seven days only” |
| App name | Limit to a source application | “Only Slack” or “Only IDE” |
| Window name | Narrow by window title where available | A specific document or browser tab title |
| Speaker | Filter transcript segments by diarised speaker | Requires speaker identification to be meaningful |
| Content type | Restrict to all, screen text, audio, UI, or combinations | Mix and match capture channels |
| Meeting | Scope to a linked meeting session | Post-call review for one calendar event |
| Browser URL | Filter by page URL when captured | Finding a specific site or path |
| Length range | Drop very short or very long snippets | Reduce noise or focus on substantive passages |
Speaker filter prerequisites
The speaker filter applies to transcribed audio that has been through speaker identification. If speakers are not labelled in your data, this filter will not magically infer people. enable and process speaker identification first, then filter.
Pagination and stats
Results are paged so the UI stays responsive on large histories.
Result anatomy
Each result card (or API row) is designed so you can triage quickly:
| Field | Purpose |
|---|---|
| Source app | Which application produced the capture |
| Timestamp | When the moment occurred (captured_at) |
| Matched text | The snippet or segment that best aligns with the query |
| Content type | Whether the hit came from OCR, audio, UI snapshot, document chunk, or a combined view |
Opening a card typically reveals fuller context and paths into the data inspector for deeper review.
Desktop UI
In the desktop Search view you will find:
- A query field for terms or natural language, depending on mode.
- An app name filter (often a dropdown) to restrict sources.
- A mode selector mapping to keyword, semantic, and auto/hybrid behaviour.
- A meeting filter when you work meeting-scoped.
- Result cards summarising app, time, matched text, and content type.
See also the desktop doc on the Search view for control placement in your build.
Tips for effective queries
- Start short, then add distinctive tokens if recall is high.
- Set time first on busy machines; it is the cheapest filter mentally and computationally.
- Match mode to memory: keywords for codes and names, semantic for concepts, hybrid when unsure.
- Use content_type when you know the channel (for example “this was definitely said, not shown”).
- Combine app + window when many tabs share one browser.
Combining filters for precision
| Goal | Suggested combination |
|---|---|
| “That Slack thread last Tuesday” | App name + time range + keyword or hybrid |
| “What Alice said in the QBR” | Meeting or time range + speaker (if available) + semantic |
| “The legal PDF, not the email” | Content type toward documents + path or time if known |
| “Chrome page about billing” | Browser URL fragment + hybrid query |
Example searches
| Scenario | Mode | Query / filters (illustrative) |
|---|---|---|
| Incident post-mortem | Keyword | Error string + history_days: 1 + relevant app_name |
| Product decision recall | Hybrid | Short conceptual query + meeting or date window |
| Accessibility audit trail | Keyword or hybrid | Control label + content_type including ui |
| Long-form doc recall | Semantic | Paraphrase + document-friendly content_type |
Search finds what was captured. If capture was paused, excluded by policy, or outside retention, it will not appear. absence of results is not proof something never happened elsewhere.