Retrieval tuning

The Retrieval tuning section at /admin/settings has two controls: a similarity threshold slider and a cross-encoder reranker toggle. They affect how the chat decides what counts as an answer. The defaults were tuned on real corpora and are right for the vast majority of small-business knowledge bases. Most admins will never touch them, and that is the correct outcome.

This article exists for the cases where you want to understand what the controls do or you have a specific reason to change something.

Similarity threshold

The slider sets a floor for how confident retrieval needs to be before Muninnbase attempts to answer.

Behind the scenes, every chunk of every document gets a relevance score against the question your employee just asked. That score is called cosine similarity, and it runs from 0 (completely unrelated) to 1 (a near-identical match). Think of it as a confidence number. Most genuine answers land somewhere between 0.4 and 0.8. Below 0.4 is usually noise; above 0.8 is unusually high.

The threshold is the cutoff. When the best chunk Muninnbase finds scores below it, the system refuses to answer rather than guess. Your employee sees "I don't have an answer to that in the available documents" instead of a confident-sounding fabrication.

Default is 0.4. Two reasons to move it:

Raise it (toward 0.5 or 0.6) if the chat is returning weak answers from loosely related chunks. A higher floor refuses more aggressively, which is the right move when you'd rather have an honest no-answer than a stretched-thin yes.
Lower it (toward 0.3) if the chat is refusing on questions you can verify are in the documents. A lower floor accepts noisier candidates and lets the reranker and the chat model sort them out.

Adjust in 0.05 steps. Change once, watch the Q&A log for a few days, then adjust again. The score column in the Q&A log detail panel shows the actual cosine values your real questions are scoring at, which makes this an evidence-based decision rather than a guess.

Cross-encoder reranker

After Muninnbase pulls candidate chunks for a question, a second-stage scorer (the "reranker") reorders them so the chunks most likely to answer rise to the top of the prompt. The reranker is the difference between "the chat returned the right paragraph" and "the chat returned a related paragraph that happened to come first."

Default is on. Leave it on. The cost is small (a few milliseconds per query); the gain is meaningfully better answer ordering, especially on questions where multiple sections of a document touch the same topic.

When to turn it off: rarely, and only if you are actively profiling something specific (for example, measuring how the chat behaves without it, on a controlled set of questions). For day-to-day use, on is the right setting.

If the reranker fails for any reason on a specific query, Muninnbase falls back to the pre-rerank ordering automatically. There is no scenario where toggling the reranker on can cause the chat to break.

What you can't tune from here, and why

Three things are deliberately not exposed in the Retrieval tuning section:

The chat model (the LLM that generates the answer text).
The embedding model (the model that indexes your documents).
The BM25 weight (the relative weight of keyword matching versus semantic matching).

These are managed by Muninnbase per tenant via feature flags. The reason is operational, not philosophical. Changing the embedding model means re-indexing every document in your library; changing the chat model needs latency and quality validation against your specific corpus; changing the BM25 weight needs the rest of the pipeline rebalanced around it. The admin settings page is not the right surface for that kind of coordinated change.

If you have a specific need for a different model, write to [email protected] with the use case. The capability exists; it just doesn't ship as a self-service control.

A note on reading the Q&A log

The score shown in the Q&A log detail panel is the same cosine number the threshold above operates on. Reading the log for a week tells you what range your real questions actually land in, which is the single best input to deciding whether your threshold is too strict, too loose, or already correct.

What to do next

If you came here to figure out whether to adjust something, the best next step is a week of reading the Q&A log. If the defaults are working (most queries scoring 0.5 or higher, very few refusals on questions you know are in the documents), leave the controls alone.