Google Scholar Scraper 2026: 10 Agent Skills That Replace Your $39/year Tool Stack
Author: Daniel ยท Primary KW: google scholar scraper (KD 4 ยท SV 250 ยท GSV 1,200 ยท CPC $2.00) Target persona: Grad students / postdocs / research analysts pulling literature at 2 AM Funnel stage: Decision Draft v1 ยท skillsmp + ClawHub edition ยท 2026-04-28 Sources: 1 BA recipe + 9 skillsmp.com skills (verified via REST API 2026-04-28) Slug: google-scholar-scraper-2026 Primary Keyword: google scholar scraper Secondary Keywords: scholar scraper, arxiv scraper, citation graph, academic paper scraper,
๐จ You're paying $39/year for a Google Scholar tool to dodge captchas.
You're paying $99/year for citation management.
You're stuck with Mendeley's broken sync.
The whole stack costs $200+/year and still won't pull 500 papers without you babysitting it.
Stop. Your university library probably already pays for half of this. The other half is now agent skills โ SKILL.md packages your AI calls like a function. The Google Scholar layer is solved.
Here are 10 best Google Scholar / arXiv scraper skills โ everything you need for a literature review pipeline in 2026.
๐
๐ 1. BrowserAct Scholar Recipe (BA stealth-extract)
๐ https://browseract.com/
What it is: A 30-second recipe, not a packaged skill. BrowserAct's stealth-extract CLI handles Google Scholar's bot detection โ including the "Please show you're not a robot" captcha that breaks every other Scholar scraper.
Why it's #1: Scholar's bot detection is the actual hard problem. Once stealth is solved, the rest (parse, dedupe, export) is trivial. Most of the OSS scrapers in this list need you to bring your own proxies + captcha solver. The BA recipe does both transparently.
The recipe:
browser-act stealth-extract \
"https://scholar.google.com/scholar?q=transformer+attention+mechanism" \
--fields "title,authors,year,citations,journal,pdf_url" \
--output papers.json
Same recipe works on scholar.google.com/scholar?cluster=... for citation pulls and ?cites= for forward citations.
2. arxiv (skillsmp ยท wanshuiyin ยท โ 9,693)
๐ https://skillsmp.com/skills/wanshuiyin-auto-claude-code-research-in-sleep-skills-arxiv-skill-md
What it does: arXiv search, paper download, abstract pull, citation graph traversal โ all callable as agent functions.
Why it's on the list: 9,693 stars puts it in the top 0.001% of skillsmp. Author wanshuiyin built it as part of an "auto-research-in-sleep" workflow โ the skill is battle-tested by users running overnight literature pulls.
Install:
npx skills add wanshuiyin/auto-claude-code/arxiv
For most STEM fields, arXiv covers 80% of what you'd otherwise pull from Scholar โ and it's API-native, no scraping required.
3. scholar-kit (skillsmp ยท lottshin ยท โ 12)
๐ https://skillsmp.com/skills/lottshin-scholar-kit-skill-md
What it does: All-in-one Scholar workflow โ search, parse, extract metadata, format citations.
Why it's on the list: It's the only skillsmp skill named explicitly "scholar-kit" โ author treated it as the canonical Scholar entry point. Fewer stars than #2, but more direct mapping to the keyword.
Install:
npx skills add lottshin/scholar-kit
4. xs-arxiv (skillsmp ยท karaage0703 ยท โ 30)
๐ https://skillsmp.com/skills/karaage0703-ai-assistant-workspace-skills-arxiv-skill-md
What it does: Lightweight arXiv lookup. Less feature-heavy than #2, faster to invoke for one-off queries.
Why it's on the list: When you don't need a citation graph, just "give me the latest 20 papers on X," xs-arxiv is the right tool. Author karaage0703's broader workspace has 30+ research skills โ quality bar is high.
Install:
npx skills add karaage0703/ai-assistant-workspace/arxiv
5. scholar-vault-gap-scout (skillsmp ยท MaxSpur ยท โ 1)
๐ https://skillsmp.com/skills/maxspur-scholar-vault-tools-vault-agent-skills-scholar-vault-gap-scout-skill-md
What it does: Reads your existing literature corpus, identifies research gaps โ topics underexplored in your field given recent citation patterns.
Why it's on the list: This is the "why am I doing this PhD" skill. You feed it your zotero library, it tells you which sub-questions have <10 citations and high opportunity. MaxSpur shipped a 6-skill vault series covering compile-paper / orient / labs-prompts / read-pdf โ install the whole vault if it clicks.
Install:
npx skills add maxspur/scholar-vault-tools/scholar-vault-gap-scout
6. arxiv-search (skillsmp ยท fmschulz ยท โ 2)
๐ https://skillsmp.com/skills/fmschulz-omics-skills-skills-arxiv-search-skill-md
What it does: arXiv targeted search with bio/omics-aware filters built in.
Why it's on the list: If your field is computational biology / bioinformatics, fmschulz's omics-skills repo has 15+ sister skills wired for the domain โ arxiv-search is the gateway.
Install:
npx skills add fmschulz/omics-skills/arxiv-search
Stop getting blocked. Start getting data.
- โ Stealth browser fingerprints โ bypass Cloudflare, DataDome, PerimeterX
- โ Automatic CAPTCHA solving โ reCAPTCHA, hCaptcha, Turnstile
- โ Residential proxies from 195+ countries
- โ 5,000+ pre-built Skills on ClawHub
7. arxiv-to-html (skillsmp ยท NTT123 ยท โ 2)
๐ https://skillsmp.com/skills/ntt123-auto-arxiv-to-html-claude-skills-arxiv-to-html-skill-md
What it does: Convert arXiv PDFs into clean reading HTML. Math equations preserved (MathJax), figures inlined, references hyperlinked.
Why it's on the list: "Read 20 papers this week" is impossible if every paper is a fight with a PDF reader. This skill turns the PDF-reading problem into a markdown-reading problem.
Install:
npx skills add ntt123/auto-arxiv-to-html/arxiv-to-html
8. scholar-evaluation (skillsmp ยท MarieLynneBlock ยท โ 2)
๐ https://skillsmp.com/skills/marielynneblock-arcanum-artifex-skills-scientific-scholar-evaluation-skill-md
What it does: Structured evaluation of a paper โ methodology critique, evidence quality scoring, identifying claims-vs.-evidence mismatches.
Why it's on the list: For lit reviews + meta-analyses, you need consistent rubrics across 50+ papers. Doing it by hand kills a week. This skill turns it into a 30-min review.
Install:
npx skills add marielynneblock/arcanum-artifex/scholar-evaluation
9. scholar-vault-compile-paper (skillsmp ยท MaxSpur ยท โ 1)
๐ https://skillsmp.com/skills/maxspur-scholar-vault-tools-vault-agent-skills-scholar-vault-compile-paper-skill-md
What it does: Drafts a paper outline + literature integration plan from your collected sources. Sister skill to #5.
Why it's on the list: The bridge between "I have 50 papers" and "I have a draft." If you're at the synthesis stage, this is your skill.
Install:
npx skills add maxspur/scholar-vault-tools/scholar-vault-compile-paper
10. arxiv-monitor (skillsmp ยท julio211916)
๐ https://skillsmp.com/skills/julio211916-tlanticad-studio-v0-1-alpha-skills-ju-skills-arxiv-monitor-skill-md
What it does: Daily arXiv watchlist. Define your topics + authors, the skill emails you new uploads each morning.
Why it's on the list: You need this for a thesis-level project. Manually checking arXiv = lost time. Free Google Scholar Alerts work but bury you in noise; this skill filters by your specific subfield.
Install:
npx skills add julio211916/tlanticad-studio/arxiv-monitor
โ ๏ธ Reality check
You don't need:
โ A $39/year scholar-helper SaaS โ these 10 skills cost $0 to install
โ A $99/year reference manager subscription โ Zotero is free, and skill #9 + a Zotero CLI is enough
โ A "Pro" plan on any of the AI literature tools โ your university library already pays for the underlying databases
You need:
โ
One stealth recipe for Google Scholar (skill #1 โ the only path that survives Scholar's captcha)
โ
One arXiv skill (skill #2 โ arxiv from wanshuiyin โ battle-tested at 9.7K stars)
โ
One synthesis skill (skill #5 or #9 from MaxSpur's vault โ for the "what am I writing?" stage)
โ
One monitor skill (skill #10 โ so you don't fall behind during writing weeks)
โ
A Claude / Codex agent to glue them together
Annual cost: ~$0 in install fees. ~$10โ30/year in BA pay-per-call for Scholar runs.
Replaces: $200+/year scholar-helper / reference-manager / paper-reader stack.
Final thought
The grad students who finish their thesis on schedule in 2026 aren't the ones with the most expensive reference manager.
They're the ones who:
1. Picked 3 skills covering "find / read / synthesize"
2. Wired them into one Claude agent
3. Spent the saved time on actually writing โ not fighting their tools
Most students won't do this. They'll keep paying for a Mendeley sub.
That's exactly why this works for the ones who do.
๐ Search 1.4M open-source skills on skillsmp: https://skillsmp.com/
๐ Try BrowserAct stealth-extract: https://browseract.com/
Automate Any Website with BrowserAct Skills
Pre-built automation patterns for the sites your agent needs most. Install in one click.
Frequently Asked Questions
Why can't I just use the "free" Google Scholar API?
Because Google doesn't publish one. Every "Google Scholar API" you see is either a third-party scraper (rate-limited, breaks weekly) or paid (SerpAPI's Scholar engine, ~$50/mo). The BA recipe (skill #1) gives you stealth-grade access at pay-per-call rates.
Does my university already have access to Web of Science / Scopus?
Probably yes. Check with your library โ institutional access often covers Web of Science, Scopus, ACM Digital Library, IEEE Xplore. If yes, use those for citation graphs and use the skills in this list (skill #1 + #2) for Scholar-only queries (open-access papers + grey literature).
Are these skills compatible with Claude Code, Codex, Cursor?
Yes โ every skillsmp entry follows the open SKILL.md format. Install via npx skills add , the file lands in ~/.claude/skills/, your agent picks it up on next launch.
How do I avoid Scholar's captcha?
Use BA stealth-extract (skill #1). Custom proxies + custom user-agent rotation handle ~80% of cases; BA's stealth handles the remaining 20% (the residential-IP cases). Most OSS Scholar scrapers (Scholarly, scholarly-py, etc.) only handle the first 80%.
What about Semantic Scholar / OpenAlex / CrossRef as alternatives?
Use them where you can. Semantic Scholar's API is free and well-documented; OpenAlex covers ~250M papers; CrossRef has 130M+ DOIs. Use Scholar only when these miss your target โ most often for grey literature, theses, and pre-print versions older sources don't index.
Which 3 skills should I start with?
#1 (BA Scholar recipe), #2 (arxiv by wanshuiyin), #5 (scholar-vault-gap-scout). Find papers, read them efficiently, identify what's missing. --- ## Internal Links (draft) - /blog/data-collection-tools-2026 โ broader data acquisition for analysts - /blog/social-media-scraper-2026 โ qual research from social - /blog/how-to-bypass-captcha-2026 โ captcha primer --- ## Supporting Keywords woven in - google scholar api - arxiv scraper - literature review tools - claude skills for research - agent skills scholar - scholar bot detection bypass --- ## Cover image brief (for blog-image skill) - 10 numbered tiles (each tile = paper / book / search icon) - First tile highlighted in BA purple #6B75FF (BA recipe = primary) - Subtitle: "10 skills ยท $0 install ยท replaces $200/year stack" - NO abstract AI brain / NO logo ring (08-blog-image-generator rules) --- ## Stats - Word count: ~1,650 - H2 count: 10 skills + Reality + Final + FAQ = 13 - BA / ClawHub slots: 1 of 10 (Top 1 recipe) - skillsmp slots: 9 of 10 - Apify Actors: 0 - GitHub repos as primary entries: 0 - Internal links: 3 - External links: 10 (100% verified 2026-04-28 via skillsmp REST API + BA root) - FAQ: 6 - Every "#" item has either a recipe or npx skills add install line
Relative Resources

What Are Claude Skills? Build Browser Automation Skills That Actually Work

Top 10 Claude Skills for Researchers in 2026: A Data-Driven Ranking

Top 10 Claude Skills for Web Scraping in 2026: A Data-Driven Ranking

Top 10 Claude Skills for Growth Marketers in 2026: A Data-Driven Ranking
Latest Resources
Best Real Estate Agent Tools 2026: 10 Agent Skills That Replace Your $200/mo Portal Stack

Best Data Collection Tools 2026: 10 Agent Skills That Replace Your $1K/mo Data Stack

The 2026 Agentic Browser Landscape: A Complete Market Map

