AFD Leaking, Part III

Well, the fix is in.

  • Robots are prohibited from viewing the bulk pages: names, host taxa, bibliography.
  • Robots get a truncated checklist
  • Robots get a publication page without lists of references or sub-publications

All of these pages are aggregations of information that is already in the main profile pages, and that’s what we want robots to be indexing.

Robots are detected via their User-Agent headers. I snarfed a list of known robots from

My logs show the same number of hits from spiders, but the URIs taking the most amount of time to respond are now mostly firefox clients.

Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6

Will this fix things? By God, it better. Still a few worrying bits. A nine-second request for

With a UA of

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +

Which my filter is supposed to be catching.

Time will tell. If I revisit this, I will look at implementing catching of If-Modified-Since, which is an outstanding feature request anyway.


