All tracks / Architecture and Technology / How Loomra ingests your workforce data: ingestion, harmonization, enrichment

How Loomra ingests your workforce data: ingestion, harmonization, enrichment

How raw HR data becomes computed workforce context

10 min read Architecture and Technology

How Loomra Ingests Your Workforce Data: Ingestion, Harmonization, Enrichment

Deep Dive | ~10 min read
Track 02: Architecture and Technology | Module 02.1: The Two-Layer Platform Architecture

Every AI vendor will tell you they are intelligent. But intelligence without context is just guessing. That single observation explains why Gloat spent eight years and processed over one billion career events to build Loomra, a purpose-built workforce AI context engine. Loomra is not a wrapper around a foundation model. It is an opinionated data platform that transforms the messy, fragmented reality of enterprise HR systems into a single, continuously updated map of your workforce.

This article goes deep on the first stage of that transformation: the data pipeline that ingests records from your HCM, ATS, LMS, and project tools, harmonizes them into canonical entities, and enriches them with inferred skills, proficiency estimates, and temporal context. Everything that Loomra’s five components depend on starts here.

Why a Dedicated Data Pipeline Matters

A typical enterprise with 50,000 employees runs Workday for core HR, SAP SuccessFactors for talent management, an ATS for recruiting, an LMS for learning, and half a dozen project and collaboration tools. Each system has its own data model, its own notion of what a “skill” or a “role” means, and its own update cadence. An employee who completes a certification in the LMS, changes roles in Workday, and finishes a cross-functional project in Jira has three partial stories in three systems. None of them, on their own, tells you what that person can actually do.

Loomra’s data pipeline exists to solve exactly this problem. It pulls data from every connected system, normalizes it against a unified ontology of 50,000+ skills across 19 languages, and produces canonical entities with full provenance chains. The result feeds all five Loomra components:

Knowledge Graph — 2.4 million entities and 18.7 million edges, queryable in under 50 milliseconds
Intelligent Tools — 14 specialized tools that have powered over 200 million matches at sub-100ms response times
Personalization Engine — living profiles built from 5 million+ employee behavioral patterns
Retrieval and Embedding — 6 purpose-built models for entity matching, semantic retrieval, skill harmonization, task harmonization, skill matching, and role proximity
Business Logic Engine — 24 rule categories enforced at under 15 milliseconds per evaluation

Every one of these components is only as good as the data it receives. The pipeline described below is the reason Loomra can confidently power 29 specialized agents across delivery surfaces like Microsoft Teams, Slack, Google Chat, and Copilot.

Stage 1: Ingestion

Ingestion is the act of pulling raw records from source systems into Loomra’s processing layer. The challenge is not just connectivity — it is doing so reliably, incrementally, and without disrupting the source systems that HR teams depend on every day.

Connector Architecture

Loomra maintains native connectors for the three dominant HCM platforms, each tuned to the vendor’s preferred integration patterns:

Workday — Loomra connects via REST APIs for real-time reads, SOAP APIs for transactional operations, and Report-as-a-Service (RaaS) for bulk data extraction. Integration respects Workday’s security group model, meaning Loomra only sees what the configured integration system user is authorized to access. Write-back support allows Loomra to push enriched skill profiles and recommendations back into Workday, closing the loop between intelligence and action.

SAP SuccessFactors — Loomra uses OData v2 for legacy modules and OData v4 for newer APIs, including the Compound Employee API, which returns a nested, multi-entity view of an employee in a single call. Role-Based Permissions (RBP) are honored natively. The Compound Employee API is particularly important because it lets Loomra pull an employee’s core record, compensation, job history, and competencies in one round trip rather than fanning out across half a dozen endpoints.

Oracle HCM Cloud — Loomra integrates via REST APIs for standard CRUD operations and SOAP APIs for specialized modules. It connects to Oracle’s Dynamic Skills framework and Talent Management suite to pull skill assessments, development goals, and succession data.

Beyond HCM platforms, Loomra ingests data from ATS systems (candidate pipelines, interview feedback, offer data), LMS platforms (course completions, certifications, learning paths), and project/collaboration tools (task assignments, deliverables, peer interactions).

Incremental Sync and Idempotent Processing

Full data loads are expensive and disruptive. Loomra performs an initial full sync when a connector is first activated, then switches to incremental synchronization. Each source record carries a change token — a timestamp, a sequence number, or a system-specific delta marker — that Loomra tracks to request only what has changed since the last sync cycle.

Incremental sync introduces a classic distributed systems problem: what happens when a sync fails midway? Loomra handles this through idempotent processing. Every incoming record is assigned a deterministic composite key derived from the source system identifier, entity type, and source record ID. If the same record arrives twice — because a retry was triggered, because the source system sent a duplicate, or because the sync window overlapped — Loomra detects the duplicate and applies the later version without creating phantom entities.

This is not a trivial guarantee. In production at over 100 enterprises spanning 112 countries and 5 million+ employees, edge cases like timezone-dependent batch jobs, retroactive effective-dated changes in Workday, and SuccessFactors’ eventually consistent OData responses all have to be handled gracefully. The pipeline treats every record as an event in an append-only log, with the most recent event for a given key representing the current state.

A Concrete Example

Consider what happens when an enterprise running SAP SuccessFactors onboards a new hire. The Compound Employee API returns a nested payload: personal information, organizational assignment, job classification, compensation, and an initial set of competencies rated by the hiring manager. Loomra’s SuccessFactors connector receives this payload during the next incremental sync, assigns a composite key (sf::employee::EMP-2024-08-1547), validates the schema against a connector-specific contract, and writes the raw event to the ingestion log. At this point, the data is in Loomra’s domain but has not yet been interpreted. That is the job of harmonization.

Stage 2: Harmonization

Raw records from different systems use different vocabularies. Workday might call a skill “Project Management.” SuccessFactors might call it “Project Mgmt.” The LMS might record a completed course titled “PMP Certification Prep.” A collaboration tool might tag someone as a “PM lead” on three consecutive projects. These are all evidence of the same underlying capability, but without harmonization, they remain disconnected fragments.

Semantic Normalization via Ontology Alignment

Loomra maintains a proprietary skills ontology containing over 50,000 skills, available in 19 languages. This is not a simple lookup table. The ontology is a graph structure where skills have parent-child relationships (e.g., “Agile Project Management” is a child of “Project Management”), lateral relationships (e.g., “Scrum” is related to “Kanban”), and contextual modifiers (e.g., “Project Management” in a software engineering context differs from “Project Management” in construction).

When a raw skill string arrives from a source system, Loomra’s harmonization pipeline runs it through a sequence of matching strategies:

Exact match — The string matches a canonical skill label or a known alias. Fast and deterministic.
Fuzzy match — The string is within edit distance of a known label, catching abbreviations (“Proj Mgmt”), minor misspellings, and locale-specific variations.
Semantic match — The string is embedded using gloat-harmonize-skill, one of Loomra’s six purpose-built embedding models, and compared against the ontology’s embedding space. This catches cases where the surface form is different but the meaning is equivalent (“People Management” vs. “Team Leadership”).
Contextual inference — When the string is ambiguous (e.g., “Python” could be a programming language or a reference to Monty Python in a media company), the pipeline uses surrounding entity context — job family, department, industry — to disambiguate.

Each match produces a confidence score between 0 and 1. High-confidence matches (above a configurable threshold, typically 0.85) are accepted automatically. Lower-confidence matches enter a human review queue where HR data stewards can confirm, reject, or remap the proposed alignment. Over time, these human decisions feed back into the matching models, improving accuracy for that specific customer’s vocabulary.

Entity Resolution

Harmonization is not limited to skills. The same logic applies to job titles, organizational units, locations, certifications, and competency frameworks. Loomra resolves each entity against its canonical taxonomy, producing a unified representation that is system-agnostic. An employee who exists in Workday, the ATS, and the LMS becomes a single canonical entity in Loomra, with each source system’s contribution tracked as a provenance record.

Provenance chains are critical for auditability. When a manager asks “why does Loomra say this person knows Python at an advanced level,” the system can trace the answer back to specific source records: a SuccessFactors competency rating, two LMS course completions, and 14 months of commits to a Python codebase captured from the project management tool. Every assertion in Loomra is backed by evidence, and every piece of evidence is traceable to a source system and a specific sync event.

Back to Our Example

The new hire’s SuccessFactors record contained three competencies: “Project Mgmt,” “Stakeholder Communication,” and “SAP S/4HANA.” The harmonization pipeline maps “Project Mgmt” to the canonical skill “Project Management” via fuzzy match (confidence: 0.94). “Stakeholder Communication” maps to “Stakeholder Management” via semantic match using gloat-harmonize-skill (confidence: 0.88). “SAP S/4HANA” matches exactly to a canonical skill in the ontology (confidence: 1.0). All three mappings exceed the 0.85 threshold and are accepted automatically. The employee’s canonical entity is created with these three validated skills, each carrying its provenance chain back to the SuccessFactors Compound Employee API payload.

Stage 3: Enrichment

Harmonized data tells you what the source systems explicitly recorded. Enrichment tells you what the data implies. This is where Loomra moves from data integration to workforce intelligence.

Skill Inference

Explicit skill data is sparse. Most employees have between 5 and 15 skills recorded in their HCM profile, but research consistently shows that knowledge workers use 30 to 50 skills in their daily work. The gap is not because people lack skills — it is because nobody updates their HR profile after learning a new tool, completing a stretch assignment, or picking up domain expertise through osmosis on a cross-functional team.

Loomra bridges this gap through skill inference. Using the gloat-match-skills-v2 model and drawing on over one billion career events in its training corpus, Loomra infers additional skills from multiple signals:

Job title and job family — A “Senior Data Engineer” is highly likely to possess SQL, ETL pipeline design, and data modeling skills, even if none are explicitly listed.
Career trajectory — An employee who moved from software engineering to product management likely retains technical skills while acquiring product strategy and roadmapping capabilities.
Learning activity — Completing an advanced machine learning course implies not just ML knowledge but prerequisite skills in statistics and programming.
Project participation — Being assigned to a cloud migration project for six months implies hands-on experience with cloud infrastructure, even if the employee’s profile still says “on-premise systems.”
Peer and organizational context — Skills common among an employee’s team members and role peers provide Bayesian priors for inference.

Each inferred skill receives a confidence score and is flagged as inferred (as opposed to explicit) in the canonical entity. This distinction matters for downstream consumers: an agent recommending someone for a critical role can weight explicit, manager-validated skills more heavily than inferred ones.

Proficiency Estimation

Knowing that someone has a skill is necessary but insufficient. Loomra estimates proficiency on a multi-level scale by combining:

Source system ratings — If SuccessFactors records a competency at “Expert” level, that is a strong signal.
Duration of application — An employee who has used Python for 8 years is likely more proficient than one who completed a bootcamp last month.
Depth of engagement — Leading a team of data scientists implies deeper ML proficiency than being an individual contributor on one ML project.
Certification and assessment data — Industry certifications (PMP, AWS Solutions Architect, CFA) provide calibrated proficiency signals.

Temporal Decay and Recency Weighting

Skills are not static. A developer who wrote Java daily five years ago but has since moved entirely to Python has a decaying Java proficiency. Loomra applies temporal decay functions to skill evidence, weighting recent signals more heavily than older ones. The decay rate is not uniform — it varies by skill category. Technology skills (specific frameworks, tools) decay faster than domain skills (industry knowledge, regulatory expertise), which decay faster than foundational skills (communication, analytical thinking).

Recency weighting also applies to career events. A role held in the last two years carries more weight for proficiency estimation than one held a decade ago. This prevents the system from treating a 20-year veteran’s first job out of college as equally relevant to their current capabilities.

The Enriched Entity

After enrichment, our example employee’s canonical entity has grown significantly. The three explicit skills from SuccessFactors are still there, with their provenance chains intact. But now the entity also includes:

Inferred skills: “Budgeting and Forecasting” (inferred from job family: Finance Operations, confidence: 0.81), “Cross-functional Collaboration” (inferred from project assignments, confidence: 0.77), and 8 additional inferred skills.
Proficiency estimates: “Project Management” at Advanced (based on 6 years in PM-adjacent roles plus the explicit competency rating), “SAP S/4HANA” at Intermediate (single explicit mention, no corroborating evidence yet).
Temporal context: All skills carry a “last evidenced” timestamp and a decay-adjusted weight.

This enriched entity is what flows into Loomra’s five components. The Knowledge Graph indexes it as a node with edges to skills, roles, teams, and projects. The Intelligent Tools use it to power matching and recommendation. The Personalization Engine incorporates it into behavioral models. The Retrieval and Embedding layer indexes it for semantic search using gloat-embed-entity-v3. The Business Logic Engine evaluates it against compliance and policy rules.

Design Philosophy: Context Is Computed, Not Collected

The conventional approach to workforce data is to collect it: ask employees to fill out profiles, ask managers to rate competencies, ask HR to maintain taxonomies. This approach produces data that is perpetually stale, inconsistently formatted, and incomplete.

Loomra inverts this model. Context is computed from the continuous stream of events that already exist in enterprise systems. Employees do not need to update their profiles because Loomra observes their career events, learning activities, project contributions, and role changes as they happen. Managers do not need to manually rate every competency because Loomra infers proficiency from observable evidence. HR does not need to maintain a bespoke skill taxonomy because Loomra’s 50,000+ skill ontology provides a universal reference frame.

This is not a philosophical preference. It is an engineering requirement. At the scale Loomra operates — 100+ enterprises, 5 million+ employees, 112 countries, 19 languages — manual data collection simply cannot keep pace. The only viable approach is continuous, automated computation from source systems, which is exactly what the ingestion-harmonization-enrichment pipeline delivers.

The result is a living data substrate that powers 29 specialized agents across every major delivery surface: Microsoft Teams, Slack, Google Chat, and Copilot. When an employee asks an agent “what roles am I a good fit for?” or a manager asks “who on my team could lead this initiative?”, the answer draws on a canonical, enriched, provenance-tracked representation of the workforce that is updated with every sync cycle. Not a stale profile. Not a best guess. Computed context.

What to Read Next

The Two-Layer Platform Architecture — Understand how the context layer (Loomra) and the agent layer (29 specialized agents) work together.
The Knowledge Graph Deep Dive — Explore how 2.4 million entities and 18.7 million edges power sub-50ms workforce queries.
HCM Integration Patterns — A technical guide to Workday, SuccessFactors, and Oracle HCM connector configuration, security models, and write-back.
How Gloat’s Embedding Models Work — A closer look at the six purpose-built models that power harmonization, matching, and retrieval.

Key insight

Most platforms ask customers to clean their data before onboarding. The Context Engine assumes the data will be messy, inconsistent, and incomplete, and treats that as the starting condition, not an obstacle.

Key terms

Ingestion

The process of connecting to source HR systems and extracting data through standard API protocols (OData, SOAP, REST) without requiring pre-mapped schemas.

Harmonization

Semantic normalization of disparate taxonomies and data structures into a canonical schema using ontology alignment rather than string matching.

Enrichment

The addition of inferred attributes (skills, proficiency levels, recency weights) that no source system provides on its own.

Ontology alignment

The process of mapping terms from different classification systems to a unified skills ontology based on semantic meaning rather than lexical similarity.

Canonical entity

A normalized record representing a person, role, skill, or asset with a stable schema and full provenance tracing every attribute to its source.

Provenance chain

An immutable record that traces every attribute on a canonical entity back to its source system, extraction timestamp, and transformation steps.

The bottom line

The Context Engine transforms raw HR system data into canonical, semantically harmonized, inference-enriched entity records. It connects via OData, SOAP, and REST APIs, harmonizes by ontology alignment rather than string matching, and enriches through skill inference, proficiency estimation, and recency weighting. The output is a unified workforce context layer that downstream systems can trust.

← Previous Loomra anatomy: five layers explained Next → Knowledge Graph, Retrieval, and workforce-specific embeddings