A GPU in my closet reads job postings now
I was checking the same three career pages every morning, so I automated myself out of the loop. Every day at 9:00 a pipeline scrapes Microsoft, Nvidia, and Apple, asks a small local LLM one question about each new posting — how many years of experience does this actually require? — and pings my phone only when the answer is one I'd want to hear.
The first 24 hours of scraping pulled 2,688 postings: 1,593 from Apple, 925 from Nvidia, and 170 from Microsoft.Not because Microsoft posts less — my extractor only asks for IC2/IC3 software roles there. Scoping the query is itself a filter. Nobody is reading 2,688 job postings. And the one thing I actually need to know — is this role for someone at my level? — is buried somewhere around paragraph four, phrased a little differently every time.
Keyword filters can't read paragraph four. A language model can. What I didn't know was whether a model small enough to run on a spare GPU could do it reliably, a thousand times a day, without me babysitting it.
The shape of the thing
Each stage only sees what the previous one lets through. The persist stage writes every scraped
posting into SQLite with INSERT OR IGNORE — the job ID and company form the primary
key — and emits only the rows that were actually new. On a normal morning that collapses
2,000+ scraped postings down to a few dozen. Without that step, every stage downstream would
re-read the entire job board every single day, and the LLM stage would never keep up.
New jobs get their full descriptions fetched (parallel Playwright browsers — the listing pages give you titles and URLs, never the actual requirements), and then comes the interesting stage.
Asking 1,095 times
Evaluation runs against vLLM serving Qwen2.5-3B on a home GPU box.A 3-billion-parameter model is laughably small by 2026 standards. It is also free, private, 30 seconds-per-request fast on local hardware, and entirely sufficient to find a number in a paragraph.
The extraction is a LangGraph graph with structured output and a retry loop: the model must
return a typed record — experience_years, visa_sponsorship,
clearance_required, minimum and preferred qualifications — or it gets asked again.
Left to answer free-form, a model this size wanders. Pinned to a schema, with a validator
kicking malformed answers back for another try, it gets things right often enough that I
stopped spot-checking its work after the first few days.
So far it has read 1,095 postings. Here's what paragraph four actually says:
The distribution is bimodal in a way I didn't expect: big spikes at 5 years (249 postings) and
8 years (205), a long tail out to 20. Early-career roles are 14% of what these three
companies post. 73 postings require a security clearance. And the bleakest number in the
database: the count of postings that explicitly promise visa sponsorship is
zero.Postings only ever volunteer the bad news — visa_sponsorship comes back "no" or "unknown", never "yes". The filter drops explicit "no"s and gives "unknown" the benefit of the doubt.
Only good news reaches my phone
The categorize stage applies the policy: pass if the extracted requirement is ≤ 2 years — or if the model couldn't find one, because a posting that doesn't state a requirement shouldn't be dropped by my robot's pessimism — and drop anything that explicitly refuses visa sponsorship. Survivors go to ntfy.sh, which pushes them to my phone. There's a catch-up pass too: a job whose evaluation wasn't ready on the morning it appeared gets alerted the next day instead of falling through a crack between stages.
What I actually learned
Strip everything away and the useful output of this system is one extracted integer per posting. The stage architecture, the dedup, the parallel browsers, the retry graph — all of it exists so that one integer shows up reliably, every day, without me touching anything. I went in expecting the LLM part to be the hard part. It wasn't. A small model with a rigid schema and retries has been more dependable than any clever prompt I've written against a much bigger model — the actual work was the plumbing around it.
Layout:
run.py (stage wiring) · pipeline.py (executor) ·
extractors/ (one per company) · stages/ (persist → details → eval →
categorize → notify) · eval_graph.py (structured extraction + retry) ·
launchd/ (the 9:00 schedule).