The Open-Source Software Saving the Internet From AI Bot Scrapers (ema /404 )Jul 7, 2025 41 ai , tooling 5 Best JavaScript Web Scraping Libraries in 2025 (api )Jun 11, 2025 40 javascript , libraries , link-lists A Thought on JavaScript “Proof of Work” Anti-Scraper Systems (cks )May 25, 2025 39 javascript , ai Meet llms.txt, a Proposed Standard for AI Website Content Crawling (sea )Mar 28, 2025 38 ai , crawling , robotstxt Web Scraping With Cheerio in 2025 (api )Mar 16, 2025 37 guides , tooling Web Scraping With Playwright Feb 24, 2025 36 playwright , typescript , youtube , functionality Clean Up HTML Content for Retrieval-Augmented Generation With Readability.js (phi /dat )Jan 9, 2025 35 html , tooling , nodejs How to Scrape Web Content for RAG With Readability.js (phi /dat )Jan 3, 2025 34 videos , how-tos , content , ai llms-txt Sep 2, 2024 33 websites , ai , crawling Why I Don’t Block AI Scrapers (j9t )Aug 29, 2024 32 ai , robotstxt Websites Are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones) (404 )Jul 29, 2024 31 ai , robotstxt The Backlash Against AI Scraping Is Real and Measurable (404 )Jul 23, 2024 30 ai , robotstxt AI Unplugged: Rise (and Fall) of the Robots(.txt) Jul 8, 2024 29 ai , robotstxt Investigating Reddit’s robots.txt Cloaking Strategy Jul 4, 2024 28 robotstxt , web Consent, LLM Scrapers, and Poisoning the Well (eri )Jun 26, 2024 27 ai , legal AI Companies Ignoring robots.txt (mjt )Jun 24, 2024 26 ai , robotstxt Let’s Build a Web Scraper in PHP and Python May 8, 2024 25 php , python Who Should Block AI Bots? (moz )Apr 17, 2024 24 ai , seo Blockin’ Bots (bee )Apr 12, 2024 23 ai , apache , configuration ai.robots.txt (cor )Mar 27, 2024 22 ai , crawling , robotstxt , tooling Go Ahead and Block AI Web Crawlers (cor )Mar 2, 2024 21 robotstxt , crawling , ai The Text File That Runs the Internet (dav /ver )Feb 14, 2024 20 robotstxt , crawling , ai , web Dark Visitors Nov 1, 2023 19 websites , ai , robotstxt Personal-Scale Web Scraping for Fun and Profit Nov 1, 2023 18 javascript , functionality , optimization Block the Bots That Feed “AI” Models by Scraping Your Website (cla )Aug 23, 2023 17 robotstxt , ai OpenAI Launches Web Crawling GPTBot, Sparking Blocking Effort by Website Owners and Creators (ven )Aug 8, 2023 16 ai , openai , crawling , robotstxt Puppeteer in Node.js: More Antipatterns to Avoid (app )Jun 14, 2023 15 nodejs , testing , anti-patterns , puppeteer Scraping Single-Page Applications With Playwright (api )Mar 16, 2023 14 single-page-apps , playwright Web Scraping—A Complete Guide Jan 21, 2023 13 guides Sophisticated Web Scraping With Bright Data (cra )Dec 14, 2022 12 structured-data , apis Web Scraping via JavaScript Runtime Heap Snapshots Apr 27, 2022 11 javascript , memory Web Scraping Is Legal, U.S. Appeals Court Reaffirms (tec )Apr 18, 2022 10 legal Web Scraping With JavaScript and Node.js Sep 1, 2021 9 javascript , nodejs Web Crawling vs. Web Scraping Jan 1, 2021 8 crawling , comparisons , terminology Web Crawler vs. Web Scraper: The Differences Jun 2, 2020 7 crawling , comparisons , terminology No Need to Protect Your Website From Scraping: 8 Reasons Apr 7, 2020 6 web , seo , legal The Ultimate Guide to Building Scalable Web Scrapers With Scrapy (sma )Jul 16, 2019 5 guides , tooling , python Web Scraping With Node.js (sma )Apr 8, 2015 4 nodejs , javascript Using .htaccess to Prevent Web Scraping Jun 24, 2014 3 servers , apache The Rise of Web Bots and Fall in Human Traffic (cra )Dec 18, 2013 2 web , spam , traffic , metrics Web Scraping in Node.js Nov 24, 2012 1 nodejs