Discuss a project or an idea
Designing a multi‑step AI workflow that turns unstructured funding scheme pages into structured, app‑ready data, replacing a manual data entry process and preparing the Funding Tool for scale.
Manual funding scheme data entry was slow, inconsistent and dependent on specialist knowledge, limiting the Funding Tool’s ability to stay up to date and scale with new schemes
Designed a multi‑step AI agent workflow that scrapes, cleans and structures scheme content into a standardised schema aligned with backend constraints
Data translation logic now automatically converts unstructured scheme pages into normalised, app‑optimised data, significantly reducing manual effort, ready to process hundreds of schemes with consistent output
A single funding scheme can have extensive, complex documentation
Historically, Soil Association staff and contractors manually identified relevant funding sources, read long description pages and entered key details into an internal admin tool, a process that took around 10–15 minutes per scheme and often required specialist agricultural knowledge, while producing non-standardised outcomes.
I unpacked the manual process end‑to‑end and, with the product lead and engineers, clarified database requirements, standard value lists and ingestion constraints so the AI output would drop into the existing Funding Tool without backend changes. We defined a lean, app‑optimised field set (scheme metadata, payments, land types, practices, application details) and agreed which values needed strict matches to enums versus more flexible text, keeping a richer “advanced” model as a future extension rather than blocking v1
I started from a broad list of over forty potential fields, then narrowed it to what the current product actually uses while preserving a clear mapping to a richer future model. For each field group, I defined which parts of source content to draw from, how to treat “up to £X” qualifiers or optional contributions, and what to do when information was ambiguous or missing, so outputs stayed consistent and ingestible.
Using Relay app, I designed a multi‑step workflow that scrapes and cleans HTML, applies focused prompts to extract specific field groups into strict JSON, and runs QA prompts using different models to flag missing or inconsistent items before data leaves the pipeline.
Prompt engineering was a major part of the work: iterating on real SFI and Capital Grant schemes, encoding enum lists into prompts, tightening instructions to reduce hallucinations, and tuning model choices and QA depth to balance quality and cost per scheme.
The result is a repeatable pattern for turning non‑structured content into structured, output the backend can ingest directly
The QA step included an accuracy score that sent an alert in slack if it was too low for a given field
To support the main extraction engine, I also designed a lightweight “Funding Scout” workflow that surfaces new agricultural schemes by research reliable sources on a monthly basis, and a “Suitability analysis” flow that checks whether a new scheme page's content is appropriate for automated extraction before it enters the pipeline. These sit around, not inside, the core workflow, helping the team expand coverage and avoid wasted runs while keeping the main focus on reliable, schema‑aligned extraction.
The AI pipeline is designed to replace approximately 10–15 minutes of manual work per scheme with automated extraction plus light QA, making it realistic to process hundreds of schemes automatically, flagging only extractions with low QA scores for manual review. Standardised prompts and a fixed schema improve consistency for eligibility, payments, land types and practices, while backend‑compatible JSON and thorough documentation give the engineering team a clear path to production integration and a foundation they can extend with more granular data as the Funding Tool evolves.