Putting out fires

For anyone that cares, this is what our crashes look like.

TOP 20 url times are:
39432223* Safari/601.3.9, search/names?&search=true&keyword=S%25%25%25
39371188* Safari/601.3.9, search/names?&search=true&keyword=S%25%25%25
39310161* Safari/601.3.9, search/names?&search=true&keyword=S%25%25%25
39300736* Safari/601.3.9, search/names?&search=true&keyword=S%25
39041255* Safari/537.36, taxa/LYCAENIDAE/complete
38945840* Safari/537.36, taxa/LYCAENIDAE/complete
38930099* Applebot, taxa/Gymnothorax_pictus/checklist
6006648* Safari/537.36, taxa/LYCAENIDAE/complete
5975658* Safari/537.36, taxa/LYCAENIDAE/complete
5914978* Safari/537.36, taxa/LYCAENIDAE/complete
5914961* Baiduspider/2.0, taxa/Talaurinus_prypnoides/checklist
5863731* Baiduspider/2.0, taxa/Platyzosteria_jungi/checklist
5838122* Baiduspider/2.0, taxa/Pterohelaeus_litigiosus/checklist
5643696 Applebot/0.1, taxa/Eulecanium/checklist
5582189 Applebot, taxa/0c1c84ef-4ad1-403f-a7f7-f45447a0372a
5527142 Yahoo! Slurp, taxa/908440b7-da2f-4840-a81f-b5b3b5a2c14e
5487277 Baiduspider, taxa/Ropalidia_plebeiana/statistics
5430266 Yahoo! Slurp, taxa/Eumelea%20duponchelii
5025511* Baiduspider/2.0, taxa/Pseudostrongyluris_polychrus/statistics
4967732 Firefox/24.0, taxa/Siganus/checklist

java.lang.OutOfMemoryError: GC overhead limit exceeded 

The numbers on the left are the request duration in milliseconds. An asterisk indicates that the request is still ongoing. As you can see, these requests are taking 10 hours to return. Obviously, I need some sort of watchdog to interrupt threads.

The entries are sorted in order of duration, so the oldest requests are sorted to the top. This tells the story. Someone, whose IP address I am not repeating here, searched for ‘S%%%’. Then went “hmm, it’s not coming back”, they hit the search button twice more, then got rid of the extra percentage signs and just searched for ‘S%’. And then, I suppose, concluded that AFD “doesn’t work” and went away.

To fix this, there’s a validation rule on the basic name search path through the app: if you use a wildcard, you must also have three non-wildcard characters. Its a stronger version of the previous validation rule, which checked only for searches that were only wildcards.

The obvious question is, “why didn’t you think about this in the first place?” The answer is that we chose to make it permissive because we have legit users who really do want all the names in AVES and should be able to get them. Now I’m putting out fires, adding limits of various kinds on a case-by-case basis. It’s bitsy. Far from ideal.

What to do, what to do.

  • Google “adding a watchdog timer to a webapp”. The difficulty is that web applications are not supposed to start their own threads. Tomcat will probably let me do it, but it shouldn’t. Need to find the proper way to go about this.
  • Why are those requests for checklists taking so long? Oh – that’s right. I preload the javascript objects with all sibling items all the way up t the root. Perhaps I should root the checklist at whichever taxon the user requests a checklist for, and provide breadcrumbs allowing the to navigate up. It would be faster and also cleaner and better-looking.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: