Discuss a project or an idea

AI‑powered funding scheme extraction for the Soil Associations Exchange platform

Turning unstructured funding scheme web pages into structured, standardised data ready for scale

Designing a multi‑step AI workflow that turns unstructured funding scheme pages into structured, app‑ready data, replacing a manual data entry process and preparing the Funding Tool for scale.

Role & Results: Owned end‑to‑end solution design | defined extraction schema and prompts, and built a multi‑step AI pipeline | Delivered a production‑ready extraction engine that standardises scheme data and significantly reduces manual entry effort.

Visit Soil Association Exchange Website

Challenge

Manual funding scheme data entry was slow, inconsistent and dependent on specialist knowledge, limiting the Funding Tool’s ability to stay up to date and scale with new schemes

Approach

Designed a multi‑step AI agent workflow that scrapes, cleans and structures scheme content into a standardised schema aligned with backend constraints

Outcome

Data translation logic now automatically converts unstructured scheme pages into normalised, app‑optimised data, significantly reducing manual effort, ready to process hundreds of schemes with consistent output

Context and problem

A single funding scheme can have extensive, complex documentation

Why manual scheme entry was a bottleneck

The Exchange platform connects sustainable farming practices to funding schemes like Sustainable Farming Initiative and Countryside Stewardship, but its Funding Tool depended on humans re‑typing complex policy pages into a rigid internal schema.

Historically, Soil Association staff and contractors manually identified relevant funding sources, read long description pages and entered key details into an internal admin tool, a process that took around 10–15 minutes per scheme and often required specialist agricultural knowledge, while producing non-standardised outcomes.

“How might we translate complex, non‑standard funding scheme web pages into structured data the Exchange app can trust, so updating and expanding the scheme library becomes fast, reliable and scalable?”

Approach & Process

From manual workflow to agentic extraction: designing an ai powered pipeline

Understanding the existing workflow

I unpacked the manual process end‑to‑end and, with the product lead and engineers, clarified database requirements, standard value lists and ingestion constraints so the AI output would drop into the existing Funding Tool without backend changes. We defined a lean, app‑optimised field set (scheme metadata, payments, land types, practices, application details) and agreed which values needed strict matches to enums versus more flexible text, keeping a richer “advanced” model as a future extension rather than blocking v1

Designing the extraction schema and prompts

I started from a broad list of over forty potential fields, then narrowed it to what the current product actually uses while preserving a clear mapping to a richer future model. For each field group, I defined which parts of source content to draw from, how to treat “up to £X” qualifiers or optional contributions, and what to do when information was ambiguous or missing, so outputs stayed consistent and ingestible.

Agentic workflow and prompt engineering

Using Relay app, I designed a multi‑step workflow that scrapes and cleans HTML, applies focused prompts to extract specific field groups into strict JSON, and runs QA prompts using different models to flag missing or inconsistent items before data leaves the pipeline.

Prompt engineering was a major part of the work: iterating on real SFI and Capital Grant schemes, encoding enum lists into prompts, tightening instructions to reduce hallucinations, and tuning model choices and QA depth to balance quality and cost per scheme.

The result is a repeatable pattern for turning non‑structured content into structured output the backend can ingest directly

QA included an accuracy score that alerted in slack if it was too low

The QA step included an accuracy score that sent an alert in slack if it was too low for a given field

Other AI workflows around the core engine

Discovering and vetting new schemes

To support the main extraction engine, I also designed a lightweight “Funding Scout” workflow that surfaces new agricultural schemes by researching reliable sources on a monthly basis, as well as a “Suitability analysis” flow that checks whether a new scheme page's content is appropriate for automated extraction before it enters the pipeline. These sit around, not inside, the core workflow, helping the team expand coverage and avoid wasted runs while keeping the main focus on reliable, schema‑aligned extraction.

Outcomes & Impact

Faster updates, consistent data, ready for scale

The AI pipeline is designed to replace approximately 10–15 minutes of manual work per scheme with automated extraction plus light QA, making it realistic to process hundreds of schemes automatically, flagging only extractions with low QA scores for manual review. Standardised prompts and a fixed schema improve consistency for eligibility, payments, land types and practices, while backend‑compatible JSON and thorough documentation give the engineering team a clear path to production integration and a foundation they can extend with more granular data as the Funding Tool evolves.

Page updated

Google Sites

Report abuse