Web Scraping With Playwright (wan) | | 35 |
playwright, typescript, youtube, functionality |
Clean Up HTML Content for Retrieval-Augmented Generation With Readability.js (phi/dat) | | 34 |
html, tooling, nodejs |
How to Scrape Web Content for RAG With Readability.js (phi/dat) | | 33 |
videos, how-tos, content, ai |
Why I Don’t Block AI Scrapers (j9t) | | 32 |
ai, robotstxt |
Websites Are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones) (jas/404) | | 31 |
ai, robotstxt |
The Backlash Against AI Scraping Is Real and Measurable (jas/404) | | 30 |
ai, robotstxt |
AI Unplugged: Rise (and Fall) of the Robots(.txt) | | 29 |
ai, robotstxt |
Investigating Reddit’s robots.txt Cloaking Strategy (rya/mer) | | 28 |
robotstxt, web |
Consent, LLM Scrapers, and Poisoning the Well (eri) | | 27 |
ai, legal |
AI Companies Ignoring robots.txt (mjt) | | 26 |
ai, robotstxt |
Let’s Build a Web Scraper in PHP and Python (the) | | 25 |
php, python |
Who Should Block AI Bots? (thc/moz) | | 24 |
ai, seo |
Blockin’ Bots (bee) | | 23 |
ai, apache, configuration |
ai.robots.txt (cor) | | 22 |
ai, crawling, robotstxt, tooling |
Go Ahead and Block AI Web Crawlers (cor) | | 21 |
robotstxt, crawling, ai |
The Text File That Runs the Internet (dav/ver) | | 20 |
robotstxt, crawling, ai, web |
Personal-Scale Web Scraping for Fun and Profit | | 19 |
javascript, functionality, optimization |
Dark Visitors (ghk) | | 18 |
websites, ai, robotstxt |
Block the Bots That Feed “AI” Models by Scraping Your Website (cla) | | 17 |
robotstxt, ai |
OpenAI Launches Web Crawling GPTBot, Sparking Blocking Effort by Website Owners and Creators (ven) | | 16 |
ai, openai, crawling, robotstxt |
Puppeteer in Node.js: More Antipatterns to Avoid (app) | | 15 |
nodejs, testing, anti-patterns, puppeteer |
Scraping Single-Page Applications With Playwright (api) | | 14 |
single-page-apps, playwright |
Web Scraping—A Complete Guide (ser) | | 13 |
guides |
Sophisticated Web Scraping With Bright Data (cra/sit) | | 12 |
structured-data, apis |
Web Scraping via JavaScript Runtime Heap Snapshots (adr) | | 11 |
javascript, memory |
Web Scraping Is Legal, U.S. Appeals Court Reaffirms (zac/tec) | | 10 |
legal |
Web Scraping With JavaScript and Node.js (zen) | | 9 |
javascript, nodejs |
Web Crawling vs. Web Scraping (zyt) | | 8 |
crawling, comparisons, terminology |
Web Crawler vs. Web Scraper: The Differences (gab) | | 7 |
crawling, comparisons, terminology |
No Need to Protect Your Website From Scraping: 8 Reasons (fin) | | 6 |
web, seo, legal |
The Ultimate Guide to Building Scalable Web Scrapers With Scrapy (sma) | | 5 |
guides, tooling, python |
Web Scraping With Node.js (sma) | | 4 |
nodejs, javascript |
Using .htaccess to Prevent Web Scraping (ds/sit) | | 3 |
servers, apache |
The Rise of Web Bots and Fall in Human Traffic (cra/sit) | | 2 |
web, spam, traffic, metrics |
Web Scraping in Node.js (cji/sit) | | 1 |
nodejs |