Well, the fix is in.
- Robots are prohibited from viewing the bulk pages: names, host taxa, bibliography.
- Robots get a truncated checklist
- Robots get a publication page without lists of references or sub-publications
All of these pages are aggregations of information that is already in the main profile pages, and that’s what we want robots to be indexing.
Robots are detected via their User-Agent headers. I snarfed a list of known robots from useragentstring.com.
My logs show the same number of hits from spiders, but the URIs taking the most amount of time to respond are now mostly firefox clients.
Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6
Will this fix things? By God, it better. Still a few worrying bits. A nine-second request for
With a UA of
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)
Which my filter is supposed to be catching.
Time will tell. If I revisit this, I will look at implementing catching of If-Modified-Since, which is an outstanding feature request anyway.