AFD Leaking, Part III

Well, the fix is in.

  • Robots are prohibited from viewing the bulk pages: names, host taxa, bibliography.
  • Robots get a truncated checklist
  • Robots get a publication page without lists of references or sub-publications

All of these pages are aggregations of information that is already in the main profile pages, and that’s what we want robots to be indexing.

Robots are detected via their User-Agent headers. I snarfed a list of known robots from

My logs show the same number of hits from spiders, but the URIs taking the most amount of time to respond are now mostly firefox clients.

Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6

Will this fix things? By God, it better. Still a few worrying bits. A nine-second request for

With a UA of

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +

Which my filter is supposed to be catching.

Time will tell. If I revisit this, I will look at implementing catching of If-Modified-Since, which is an outstanding feature request anyway.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: