A GPU in my closet reads job postings now

Jun 8, 2026 · 5 min read

I was checking the same three career pages every morning, so I automated myself out of the loop. Every day at 9:00 a pipeline scrapes Microsoft, Nvidia, and Apple, asks a small local LLM one question about each new posting (how many years of experience does this actually require?), and pings my phone only when the answer is one I'd want to hear.

The first 24 hours of scraping pulled 2,688 postings: 1,593 from Apple, 925 from Nvidia, and 170 from Microsoft.Not because Microsoft posts less; my extractor only asks for IC2/IC3 software roles there. Scoping the query is itself a filter. Nobody is reading 2,688 job postings. And there's one thing I actually need to know: is this role for someone at my level? It's buried somewhere around paragraph four, phrased a little differently every time.

Keyword filters can't read paragraph four. A language model can. What I didn't know was whether a model small enough to run on a spare GPU could do it reliably, a thousand times a day, without me babysitting it.

The shape of the thing

scrape→ persist→ details→ LLM eval→ categorize→ notify

Each stage only sees what the previous one lets through. The persist stage writes every scraped posting into SQLite with INSERT OR IGNORE (the job ID and company form the primary key) and emits only the rows that were actually new. On a normal morning that collapses 2,000+ scraped postings down to a few dozen. Without that step, every stage downstream would re-read the entire job board every single day, and the LLM stage would never keep up.

New jobs get their full descriptions fetched (parallel Playwright browsers; the listing pages give you titles and URLs, never the actual requirements), and then comes the interesting stage.

Asking 1,095 times

Evaluation runs against vLLM serving Qwen2.5-3B on a home GPU box.A 3-billion-parameter model is laughably small by 2026 standards. It is also free, private, 30 seconds-per-request fast on local hardware, and entirely sufficient to find a number in a paragraph. The extraction is a LangGraph graph with structured output and a retry loop: the model must return a typed record (experience_years, visa_sponsorship, clearance_required, minimum and preferred qualifications), or it gets asked again. Left to answer free-form, a model this size wanders. Pinned to a schema, with a validator kicking malformed answers back for another try, it gets things right often enough that I stopped spot-checking its work after the first few days.

So far it has read 1,095 postings. Here's what paragraph four actually says:

2,688

postings scraped in the first 24h

1,095

evaluated by the LLM

150

explicitly ask for ≤2 years

The distribution is bimodal in a way I didn't expect: big spikes at 5 years (249 postings) and 8 years (205), a long tail out to 20. Early-career roles are 14% of what these three companies post. 73 postings require a security clearance. And the bleakest number in the database: the count of postings that explicitly promise visa sponsorship is zero.Postings only ever volunteer the bad news: visa_sponsorship comes back "no" or "unknown", never "yes". The filter drops explicit "no"s and gives "unknown" the benefit of the doubt.

Only good news reaches my phone

The categorize stage applies the policy: pass if the extracted requirement is ≤ 2 years, or if the model couldn't find one (because a posting that doesn't state a requirement shouldn't be dropped by my robot's pessimism), and drop anything that explicitly refuses visa sponsorship. Survivors go to ntfy.sh, which pushes them to my phone. There's a catch-up pass too: a job whose evaluation wasn't ready on the morning it appeared gets alerted the next day instead of falling through a crack between stages.

Decision: no dashboard. The whole thing runs from a launchd plist at 09:00 daily and the only interface is the notification. I considered building a little web UI and decided against it: if I have to remember to go look at a tool, I've just rebuilt the thing I was trying to stop doing.

What I actually learned

Strip everything away and the useful output of this system is one extracted integer per posting. The stage architecture, the dedup, the parallel browsers, the retry graph, all of it exists so that one integer shows up reliably, every day, without me touching anything. I went in expecting the LLM part to be the hard part. It wasn't. A small model with a rigid schema and retries has been more dependable than any clever prompt I've written against a much bigger model. The actual work was the plumbing around it.

Stack: Python · Playwright · SQLite · LangGraph · vLLM (Qwen2.5-3B-Instruct) · ntfy.sh · launchd.
Layout: run.py (stage wiring) · pipeline.py (executor) · extractors/ (one per company) · stages/ (persist → details → eval → categorize → notify) · eval_graph.py (structured extraction + retry) · launchd/ (the 9:00 schedule).

Tags: Python, LLM, vLLM, Pipelines, Automation