All tracks / Glossary and Reference / RFP template: questions to ask every vendor

RFP template: questions to ask every vendor

Thirty-six structured questions across six evaluation dimensions, with scoring guidance for each

6 min read Glossary and Reference

How to use this template

This template is structured around six evaluation dimensions. Each dimension contains six questions. For every question, the template provides three elements:

  • Why it matters: The business or technical rationale behind the question.
  • Good answer: What a mature, capable vendor response looks like.
  • Red flag: What signals a gap, deflection, or fundamental limitation.

Send these questions as part of your formal RFP, or use them as a structured interview guide during vendor demos. Score each response on a 1-5 scale and aggregate by dimension to produce a comparative scorecard.

Dimension 1: Skills and data architecture

# Question Why it matters Good answer Red flag
1 Describe your skills ontology. How many skills does it contain, how are relationships structured, and how frequently is it updated? The ontology is the reasoning backbone. A shallow or static taxonomy limits every downstream agent. Millions of skills with hierarchical and lateral relationships. Updated continuously from labor-market data, job postings, and platform usage signals. A flat list of skills with no relationship graph. Updated annually or only when the customer requests it.
2 How do you infer skills that employees have not explicitly listed on their profiles? Declared skills represent a fraction of actual capability. Inference is what makes matching and planning accurate. Combines role history, project data, peer benchmarks, and learning completions to infer latent skills with confidence scoring. Relies entirely on employee self-declaration or manager endorsement. No inference engine.
3 Can your platform ingest and normalize skills data from our existing HRIS, ATS, and LMS without requiring a manual mapping exercise? Integration friction is the top cause of failed pilots. Manual mapping does not scale. Automated ingestion with ML-based normalization that maps customer taxonomies to the platform ontology. Human review for edge cases only. Requires the customer to complete a multi-month mapping exercise before the platform can function.
4 How do you handle skills data for contingent workers, contractors, and external candidates? Workforce planning increasingly includes non-employee talent. A platform that only sees full-time employees has a blind spot. Unified data model that accommodates all worker types with appropriate access controls and data retention policies. Only supports full-time employee profiles. Contingent data must be managed in a separate system.
5 What is the minimum data set required to deploy your first agent, and what happens to agent quality as more data becomes available? Determines time-to-value and whether you can start small. Clear minimum viable data set (e.g., job titles, org structure, basic skills). Documented quality improvement curve as data richness increases. Requires comprehensive, clean data across all systems before any agent can be activated.
6 How do you maintain data freshness? What happens when an employee changes roles, completes a course, or leaves the organization? Stale data produces stale recommendations. Real-time or near-real-time sync is essential. Event-driven data sync via webhooks or streaming integrations. Profile updates reflected within minutes, not days. Batch sync on a weekly or monthly schedule. No event-driven updates.

Dimension 2: Agent architecture and orchestration

# Question Why it matters Good answer Red flag
7 How many distinct agents does your platform ship, and what is the scope of each? Reveals whether the vendor has purpose-built agents or a single chatbot rebranded as multiple agents. Named agents with distinct scopes (matching, planning, gap analysis, etc.). Each has defined inputs, outputs, and success metrics. One general-purpose assistant that handles all queries. No distinct agent boundaries.
8 How do multiple agents coordinate when a workflow spans more than one domain (e.g., internal mobility requires matching + gap analysis + development planning)? Multi-agent orchestration is the hardest architectural problem. Weak coordination leads to inconsistent outputs. A defined orchestration layer that routes context between agents, manages state, and enforces sequencing rules. Agents operate in isolation. The user must manually transfer outputs from one agent to another.
9 Can customers configure agent behavior (e.g., matching weights, approval thresholds, escalation rules) without writing code? Business users need to tune agents to organizational policy. If every change requires professional services, iteration speed drops. Admin-accessible configuration UI with guardrails. Changes auditable and reversible. All configuration changes require vendor professional services or custom code deployments.
10 What LLMs power your agents, and can customers bring their own model or choose between providers? Model lock-in and data residency concerns are common in regulated industries. Model-agnostic architecture. Supports multiple providers. Clear documentation on which models are used for which tasks. Single model dependency with no flexibility. Unclear about model versioning or change management.
11 How do you handle agent failures, hallucinations, or low-confidence outputs? Every agent will produce errors. The question is whether the system detects and mitigates them. Confidence scoring, fallback logic, human escalation triggers, and post-hoc quality sampling. No mention of failure modes. Claims the system does not hallucinate.
12 What is the latency profile for your agents? How long does a typical matching or planning operation take? Agents that take minutes to return results break the user experience and reduce adoption. Sub-second for simple queries. Seconds for complex matching. Async processing with status updates for batch operations. No published latency benchmarks. Demo environment does not reflect production performance.

Dimension 3: Governance and compliance

# Question Why it matters Good answer Red flag
13 Can administrators define which decisions agents execute autonomously versus which require human approval? The human-in-the-loop boundary is the most consequential governance decision. It must be configurable, not hard-coded. Granular policy engine. Configurable per agent, per decision type, per business unit. Binary on/off. Either fully autonomous or fully manual. No middle ground.
14 How do you audit agent decisions? Can we retrieve the full decision lineage for any recommendation? Regulatory compliance and internal trust both require explainability. Every decision logged with inputs, reasoning steps, model version, confidence score, and timestamp. Exportable for compliance review. No audit trail. Recommendations appear as outputs with no visibility into how they were generated.
15 What bias detection and mitigation mechanisms are built into your agents? Agents trained on historical data can perpetuate existing biases in hiring, promotion, and mobility. Pre-deployment bias testing, ongoing monitoring dashboards, configurable fairness constraints, and regular third-party audits. Claims the system is unbiased by default. No monitoring or testing framework.
16 How does your platform handle data residency requirements for multinational organizations? Employee data for EU, APAC, and other regions may have strict localization requirements. Multi-region deployment options. Data residency controls at the tenant or business-unit level. Documented compliance with GDPR, CCPA, and other frameworks. Single-region deployment only. Data residency managed through legal agreements, not technical controls.
17 What certifications does your platform hold (SOC 2, ISO 27001, etc.)? Table-stakes for enterprise procurement. Absence signals immature security posture. SOC 2 Type II, ISO 27001, and relevant industry certifications. Audit reports available under NDA. Certifications in progress but not yet completed. No third-party audit reports available.
18 How do you handle right-to-deletion (RTBF) requests across all agent data stores? GDPR Article 17 compliance requires deletion from every data store, including training data, logs, and embeddings. Automated deletion workflow that purges data from all stores, including vector databases and model fine-tuning sets. Deletion confirmation with audit record. Manual deletion process that does not cover all data stores. No mechanism for deleting data from embeddings or model training sets.

Dimension 4: Integration and deployment

# Question Why it matters Good answer Red flag
19 Which HCM, ATS, and LMS platforms do you integrate with natively? Native integrations reduce deployment time and maintenance overhead. Pre-built connectors for major platforms (Workday, SAP SuccessFactors, Oracle HCM, Greenhouse, iCIMS, Cornerstone). Published integration catalog with data flow documentation. Custom integrations required for every customer. No pre-built connectors.
20 What is the typical deployment timeline from contract signature to first agent in production? Time-to-value is a critical success factor. Long deployments erode sponsor confidence. 4-8 weeks for first agent. Phased rollout plan with defined milestones. 6-12 month deployment timeline with no interim milestones or early value delivery.
21 Can your platform surface agent outputs inside our existing collaboration tools (Teams, Slack, email)? Adoption depends on meeting users where they work. Native integrations with major collaboration platforms. Configurable notification and interaction surfaces. Users must log into a separate portal to access agent outputs. No embedded experience.
22 How do you handle SSO, SCIM provisioning, and role-based access control? Enterprise identity management is non-negotiable. SAML/OIDC SSO, SCIM 2.0 provisioning, granular RBAC with HR-specific role definitions. Basic username/password authentication. Manual user provisioning.
23 What is your API strategy? Can we build custom workflows or dashboards on top of your platform? Extensibility determines long-term platform value. Comprehensive REST or GraphQL API. Published documentation, versioning policy, and rate limits. Webhook support for event-driven integrations. No public API. All customization requires vendor professional services.
24 How do you manage upgrades and new feature releases? Is there downtime? SaaS release cadence affects both reliability and feature velocity. Zero-downtime deployments. Regular release cadence. Feature flags for gradual rollout. Customer notification and release notes. Scheduled maintenance windows with downtime. Infrequent, large releases with limited customer communication.

Dimension 5: Outcomes and measurement

# Question Why it matters Good answer Red flag
25 What KPIs does your platform track natively, and can we define custom metrics? If you cannot measure it, you cannot prove value to the business. Pre-built dashboards for internal fill rate, time-to-fill, skills gap closure, agent adoption, and recommendation acceptance rate. Custom metric builder for organization-specific KPIs. No native analytics. Requires export to a third-party BI tool for any reporting.
26 Can you share anonymized benchmark data from comparable deployments? Benchmarks validate vendor claims and set realistic expectations. Published benchmark reports segmented by industry, company size, and use case. Willing to share anonymized case studies. No benchmarks available. Only anecdotal references.
27 How do you measure agent quality over time? What feedback loops exist? Agent performance must improve continuously. Static accuracy is a liability. Automated quality scoring, user feedback capture, and model retraining pipelines. Quality metrics visible to administrators. No ongoing quality measurement. Model is trained once and deployed without continuous improvement.
28 What is your pricing model, and how does cost scale as usage increases? Predictable economics prevent budget surprises. Transparent per-employee or per-agent pricing. Published rate cards. No hidden compute or API usage fees. Opaque pricing. Usage-based fees that are difficult to predict. Charges per agent interaction or per API call.
29 What does your customer success model look like post-deployment? Long-term success depends on ongoing enablement, not just implementation. Dedicated customer success manager. Quarterly business reviews. Access to product roadmap. Customer advisory board. Support is limited to a ticketing system. No proactive engagement or strategic guidance.
30 Can you provide three reference customers in our industry and of similar size? References validate production maturity and industry relevance. Multiple references willing to speak openly about deployment experience, challenges, and outcomes. No references available. Claims all customers are under NDA.

Dimension 6: Roadmap and vision

# Question Why it matters Good answer Red flag
31 What is your 12-month product roadmap for agent capabilities? You are buying a trajectory, not just a current state. Published roadmap with themes, timelines, and customer input mechanisms. Willingness to share under NDA. No roadmap visibility. Vague statements about future innovation.
32 How do you incorporate customer feedback into product development? Vendors that build in isolation drift from market needs. Structured feedback channels: advisory boards, feature voting, beta programs, and direct PM access. Feature requests go into a backlog with no visibility or prioritization framework.
33 What is your strategy for keeping pace with LLM advancements (new models, capabilities, cost reductions)? The LLM landscape shifts rapidly. Vendor agility is a competitive advantage. Model-agnostic architecture. Regular evaluation of new models. Documented model migration process. Tightly coupled to a single model provider with no migration path.
34 How do you plan to expand from HR-specific agents to broader workforce and business agents? Platform ambition signals long-term investment and expandability. Clear vision for adjacent domains (finance workforce planning, operations staffing) with defined integration points. No vision beyond current HR use cases. Platform is designed as a point solution.
35 What is your approach to open standards and interoperability in the agentic AI ecosystem? Proprietary lock-in is a risk as the market matures. Participation in industry standards bodies. Open APIs. Support for emerging agent interoperability protocols. Fully proprietary. No engagement with open standards. Data export is difficult or restricted.
36 If we decide to leave your platform, what does the data export and transition process look like? Exit planning is a sign of vendor confidence and customer respect. Full data export in standard formats. Documented transition process. Contractual data portability guarantees. No data export capability. Contractual penalties for early termination. Data is held hostage.
Key insight

An RFP is only as useful as its specificity. Generic questions get generic answers. These questions are designed to force vendors to reveal architectural decisions, data dependencies, and governance capabilities that matter for agentic HR.

Key terms

Evaluation dimension
A category of vendor capability that groups related RFP questions for structured comparison.
Red flag
A vendor response pattern that signals a fundamental limitation, misalignment, or risk that may not be recoverable post-contract.
Skills inference
The ability to deduce skills an employee likely possesses based on role history, project assignments, and peer data, even when those skills are not explicitly listed.
Explainability
The degree to which an agent can articulate the reasoning, data inputs, and weighting behind a specific recommendation or action.
Human-in-the-loop
A governance model where certain agent decisions require explicit human approval before execution.
Decision lineage
A complete audit trail showing every data input, rule applied, and intermediate step that led to an agent's output.
The bottom line

Use these 36 questions as-is or adapt them to your organization's priorities. The goal is not to generate a scorecard that picks a winner automatically. It is to surface the architectural and operational differences that determine long-term success.