# ABOUT THIS FILE # ------------------------------------------------------------------- # Version: 0.2.2 # Updated: 27/06/2019 # Source: https://gitlab.com/beepmode/robotctl # # This robots.txt file contains robots that have been observed to hit # servers hard _and_ which serve little to no purpose. Most of these # bots appear to respect robots.txt files. Bots that appear to ignore # robots.txt files are listed seperately. # # Note that this file doesn't aim to be an exhaustive list of nastly # bots. Rather, it is based on bots I am seeing on my own servers. # If you don't mind a robots.txt file that is over 3,500 lines long # then you could use this file instead: # https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/robots.txt/robots.txt # NAUGHTY BOTS THAT MAY IGNORE THIS FILE # ------------------------------------------------------------------- # The following are bots that appear to ignore robots.txt files. You # may want to block these via a ModSec rule instead. # Bot: Barkrowler (http://www.exensa.com/crawl) # Description: Vague bot from a vague company that doesn't provide # robots.txt info. User-agent: Barkrowler Disallow:/ # Bot: DomainCrawler (http://www.domaincrawler.com) # Description: Useless, outdated bot that appears to disregard # robots.txt files. User-agent: DomainCrawler/3.0 Disallow: / # Bot: Spbot (http://openLinkprofiler.org/bot) # Description: Yet another useless SEO bot. No longer provides # robots.txt information. User-agent: spbot Disallow: / # BOTS THAT WILL RESPECT THIS FILE # ------------------------------------------------------------------- # Bot: AhrefsBot (https://ahrefs.com/robot) # Description: Yet another useless SEO bot. User-agent: AhrefsBot Disallow: / # Bot: AlphaBot/3.2; (http://alphaseobot.com/bot.html) # Description: Yet another useless SEO bot. User-agent: AlphaSeoBot Disallow: / # Bot: BubiNG (http://law.di.unimi.it/BUbiNG.html) # Description: vague, open source bot. User-agent: BUbiNG Disallow: / # Bot: Cliqzbot (https://cliqz.com/en/cliqzbot) # Description: Proprietary bot by small company that lets you # perform searches in your www browser (as opposed to using the # Yellow Pages?). User-agent: Cliqzbot Disallow: / # Bot: Dotbot (http://www.opensiteexplorer.org/dotbot) # Description: Useless SEO bot that can hit websites hard. User-agent: dotbot Disallow: / # Bot: Linguee # Description: Bot used for a proprietary translation app. It's # not clear why the bot lives and it has been caught hitting # websites hard. User-agent: Linguee Disallow: / # Bot: linkdexbot (http://www.linkdex.com/bots/) # Description: Yet another useless SEO bot. User-agent: linkdexbot/2.0 Disallow: / # Bot: MJ12bot (http://mj12bot.com) # Description: Bot that wants to understand and paint a map or the # internet. User-agent: MJ12bot Disallow: / # Bot: https://www.semrush.com/bot/ # Description: Yet another useless SEO bot. User-agent: SemrushBot Disallow: / # Bot: SeznamBot (http://napoveda.seznam.cz/en/seznambot-intro/) # Description: Czech search engine. Nothing wrong with that but the # bot can trigger a huge number of hits. User-agent: SeznamBot Disallow: / # Bot: TurnitinBot (https://turnitin.com/robot/crawlerinfo.html) # Description: proprietary, commercial anti-plagerism bot. User-agent: TurnitinBot Disallow: / # Bot: Screaming Frog SEO Spider (https://www.screamingfrog.co.uk/seo-spider/faq/) # Description: Yet another SEO bot. User-agent: Screaming Frog SEO Spider Disallow: /