Open role · AI Engineer

Build the runtime that makes agent output trustworthy.

You build the runtime, the extraction pipeline, and the evals behind AI agents built for regulated-bank production standards. The stack is Next.js and Convex. The standard is evals that have actually failed a build that should not have shipped — not evals that pass because they checked the wrong thing.

Mission

What the role does and why it matters here.

The AI Engineer builds the runtime that makes agent output worth a reviewer's trust. The harness stays consistent across engagements; the extraction pipeline handles imperfect source material; the eval suite is the one the team trusts because it has actually failed a build that should not have shipped. The role turns patterns earned in regulated-bank production into runtime, eval, extraction, and observability work the team can reuse.

Responsibilities

What you would own.

Build and maintain the agent runtime
The harness holds an agent's tools, memory, and retrieval. You build it so the same runtime works across engagements with different validator sets. It lives in the IAS codebase and in engagement-specific repos.
Own the extraction pipeline
Every value the agent extracts carries a confidence score and a link back to the source page. A reviewer can move from the extracted value to the source in the same workflow. You build and maintain the pipeline behind that output, including its behavior on inconsistent exports, scans, missing fields, and source documents that disagree.
Run the eval suite
Truth sets the team agrees on. Regression coverage on changes. Drift alerts when the harness behaves differently from a prior run. The bar is an eval that has actually stopped a build that should not have shipped; you build toward that bar and away from evals that pass vacuously.
Ship validators as code
Working from the Context Architect's spec, you implement the validator set as code that runs alongside every agent step. Failures are visible. Advisory checks are marked as advisory. Validator output is part of the audit trail.
Maintain the provider abstraction
Mock providers during development, real providers in production, no rewrite in between. The interface is yours to design; the migration path is part of the design.
Build production observability for agent runs
The team has to be able to read what an agent did before they trust the next run. You build the observability that supports that, without surfacing harness internals in the user-facing UI.

How you think and work

Six traits the work demands.

These roles don't filter on industry pedigree. They filter on disposition. The traits below name what the work actually demands; the patterns underneath them are what we'd read in an application as evidence those traits are real.

How you think and work

Agentic intuition
You read agents the way a manager reads a direct report — when to trust the output, when to interrupt the run, when to take the wheel back.
The evals you've shipped are the ones that have actually caught a run that should not have shipped — not evals that pass because they checked the wrong thing.
Critical thinking
Confident-sounding output gets the same scrutiny as anything else — your own work included.
You've found an eval that was passing for the wrong reason — and rewrote it.
Curiosity
You pull on threads. You read outside the lane. You follow a question past the first plausible answer.
You'd rather read the implementation than the README.
Agency
You move without being told. You decide, ship, own the call. No one has to write the playbook for you.
You've shipped the change because the meeting would have taken three weeks.
Systems thinking, long view
You see how the parts connect — and where this goes in three years.
You build the interface knowing the second use case will teach you more than the first.
Leadership instinct
You orchestrate work across humans, agents, and stakeholders. You switch register between a Linear ticket, an architect call, and a senior bank room in the same day without losing what you came in to say.
You've explained a tradeoff to a non-engineer and walked out with the right call.

Shape of work that maps

You've built a regression discipline — eval suite, golden tests, drift monitoring — that has actually caught a release that would have shipped wrong.
You've handled messy real-world inputs: inconsistent source documents, scans, dirty cross-system data, or records that disagree. You know where it broke and what you did about it.
You've shipped a typed end-to-end stack to real users. The Next.js + Convex stack we use is one shape of this; we read the discipline, not the framework.
Your prompts — or the inputs to your reasoning systems — live in version control with diffs and review. Because the alternative bothers you.

Regulated-industry experience is not required. Curiosity about it is. The work is shaped by patterns the founder earned in regulated-bank production.

Logistics

How the role is set up.

Engagement type: Contract or full-time. Contractors run on a defined engagement scope and can convert to full-time. Full-time runs on a yearly review with a quarterly written check-in.
Location: Remote. Lucentive is EU-based; we expect at least four hours of overlap with CET on a working day. Travel for engagement kickoffs is occasional, not weekly.
Compensation: Discussed in the written exchange. We pay at market rate for senior engineers shipping AI into production; the band depends on contract or full-time and on location.

Apply

We look forward to hearing from you.

Apply for AI Engineer