Thingy Ma Jig is the blog of Nicholas Thompson and contains any useful tips, sites and general blog-stuff which are considered interesting or handy!
Posted on 04 March 2010 in
programming
linux
How to
geek
Drupal
apache
Following Kevin Hankens post on why you shouldn't ignore Drupal 404 errors, I decided to go through yesterdays error_log on our live apache server (the one which hosts www.pponline.co.uk, www.sportbusiness.com and www.mychild.co.uk alongside around 40 other dupral sites).
It turns out there were almost 5,000 404 (page not found) errors. How to find the most "popular" ones though? This called for a Bash script…
gawk '{ print $13 }' error_log.1 | grep ^/var | sort | uniq -c | sort -n
This uses gawk
to parse yesterdays error log (hence the .1) and return column 13, assuming space is the default delimiter. Note: It turns out single quotes and double quotes mean different things to gawk!. Next I want to filter out lines beginning with "/var"; the gawk also returned values from memory & PHP errors. Next, sort them and do a unique liens count. Finally, sort this result with the most common entries at the end.
The result? SportBusiness REALLY needs a favicon in the default place - that along accounted for 20% of the 404's!
gunzip -c /var/log/httpd/error_log.3.gz | gawk '{ print $13 }' | grep ^/var | sort | uniq -c | sort -n
This is a slight alternative, if you use compressed log files is the following (it saves decompressing the file first).