All tracks / Architecture and Technology / RAG, HyDE, and semantic retrieval for HR context

RAG, HyDE, and semantic retrieval for HR context

How retrieval-augmented generation keeps HR agents grounded in your data, and why the retrieval step matters more than the generation step.

7 min read Architecture and Technology

The hallucination problem in HR

Large language models are trained on broad internet data. When you ask an LLM about your organization's parental leave policy, it does not know your policy. It knows the statistical average of every parental leave policy it has ever seen in training data. It will generate a plausible-sounding answer that may be completely wrong for your organization.

In HR, wrong answers are not just unhelpful. They are dangerous. An employee who gets incorrect information about their benefits, their eligibility for a role, or their compensation structure may make decisions based on that misinformation. The liability exposure is real.

RAG solves this by grounding generation in your actual organizational data. Instead of asking the LLM to recall information from training, you retrieve the relevant documents first, then ask the LLM to generate a response based only on those documents.

The RAG pipeline, step by step

A production RAG pipeline for HR context has five stages.

Stage	What Happens	HR-Specific Consideration
1. Ingestion	Documents are loaded, cleaned, and chunked	Policy documents, job descriptions, org data, benefits guides
2. Embedding	Each chunk is converted to a vector representation	Domain-tuned embeddings outperform generic models on HR vocabulary
3. Indexing	Vectors are stored in a vector database for fast retrieval	Metadata filters (region, business unit, effective date) are critical
4. Retrieval	User query is embedded and matched against stored vectors	Must handle ambiguity (“What is my leave policy?” depends on location, role, tenure)
5. Generation	Retrieved chunks + query are sent to the LLM for response	System prompt enforces factual grounding, cites sources, flags uncertainty

Each stage introduces potential failure modes. Poor chunking fragments a policy across multiple chunks so no single chunk contains the complete answer. Generic embeddings miss HR-specific semantics. Missing metadata filters return policies for the wrong country. The generation step can still hallucinate if the prompt does not constrain it properly.

Why retrieval is the ceiling

Here is the most important insight about RAG systems: the quality ceiling is set by retrieval, not generation. If the retrieval step returns the wrong documents, the LLM will confidently generate an answer based on irrelevant information. If the retrieval step returns the right documents, even a modest LLM will produce a useful answer.

This has practical implications for where you invest engineering effort.

Investment Area	Impact on Quality	Typical Effort
Upgrading LLM (e.g., GPT-3.5 to GPT-4)	Moderate: better reasoning, fewer formatting errors	Low (API swap)
Improving chunking strategy	High: right information in each chunk	Medium (domain expertise needed)
Adding metadata filters	High: correct policy for correct context	Medium (data enrichment)
Domain-tuned embeddings	High: better semantic understanding of HR language	High (training data + compute)
HyDE implementation	High: bridges question-document vocabulary gap	Medium (prompt engineering + extra LLM call)

Most teams over-invest in the generation step and under-invest in retrieval. Swapping to a more powerful LLM is easy and visible. Fixing chunking boundaries is tedious and invisible. But the chunking fix produces a larger quality improvement nearly every time.

HyDE: bridging the vocabulary gap

Standard semantic search embeds the user's question and finds documents with similar embeddings. This works well when the question and the answer use similar vocabulary. But in HR, they often do not.

An employee asks: “Can I work from another country for a few weeks?” The relevant policy document is titled “International Remote Work Authorization” and uses terms like “cross-border employment,” “tax nexus,” and “permanent establishment risk.” The semantic distance between the question and the document is large.

HyDE (Hypothetical Document Embedding) addresses this by adding an intermediate step. Instead of embedding the question directly, the system first asks the LLM to generate a hypothetical answer to the question. This hypothetical answer uses the kind of language that policy documents use. Then the system embeds the hypothetical answer and uses that as the search query.

The process looks like this:

Employee asks: “Can I work from another country for a few weeks?”
LLM generates hypothetical answer: “International remote work is subject to cross-border employment regulations. Employees must obtain authorization through the international remote work policy, which addresses tax nexus implications, permanent establishment risk, and local employment law compliance…”
The hypothetical answer is embedded and used to search the vector database
Retrieval now finds the “International Remote Work Authorization” policy because the vocabulary aligns
The actual policy content is sent to the LLM for the final, grounded response

HyDE adds latency (one extra LLM call) but significantly improves retrieval precision for queries where question vocabulary diverges from document vocabulary. In HR contexts, this gap is common because employees use casual language while policies use legal and regulatory terminology.

Chunking strategies that matter for HR

How you split documents into chunks determines what the retrieval step can find. Generic chunking (split every 500 tokens) fragments HR documents in destructive ways. A benefits policy that explains eligibility criteria in one paragraph and coverage details in the next gets split across two chunks, so neither chunk contains the complete answer.

Effective HR chunking strategies include:

Section-aware chunking: Split on document headings and section boundaries rather than token counts. Each policy section stays intact.
Hierarchical chunking: Store both the section-level chunk and a summary of the parent document. Retrieval can match at either level.
Metadata enrichment: Every chunk carries metadata (country, business unit, effective date, policy category) that enables filtered retrieval.
Overlap windows: When token-based splitting is necessary, include overlap between adjacent chunks so boundary content appears in both.

The choice of chunking strategy should be validated empirically. Build a test set of 100 real employee questions, run retrieval with different chunking approaches, and measure how often the correct chunk appears in the top 3 results. This retrieval recall metric is the single best predictor of end-to-end system quality.

HR-specific retrieval challenges

HR data has characteristics that make retrieval harder than generic document search.

Context-dependent answers. “What is my PTO balance?” requires knowing who is asking, their location, their tenure, their employment type. The retrieval system must resolve this context before searching.

Temporal validity. Policies change. The parental leave policy effective January 2026 is different from the one effective January 2025. Retrieval must respect effective dates and return the current version.

Multi-document answers. “Am I eligible for the internal mobility program?” might require combining information from the mobility policy, the employee's performance data, their tenure, and their manager's approval status. No single document contains the complete answer.

Confidentiality boundaries. The system must never retrieve documents that the requesting user is not authorized to see. A manager asking about compensation ranges should see their team's data but not another team's. Access control must be enforced at the retrieval layer, not the generation layer.

Each of these challenges requires architectural solutions beyond basic RAG. Context resolution requires integration with identity and org data. Temporal validity requires metadata filtering. Multi-document answers require retrieval orchestration. Confidentiality requires row-level access control in the vector database.