Finding “popular” IP addresses in access_log files

Every now and then I find myself needing to quickly analyse a set of access_log files to see who the most common visitors are so that I can decide if there are any abusers I should be blocking or poorly configured services running somewhere that I can try to get fixed.  I can never remember the quickest was to do this so I decided to write down the “one liner” that I cobbled together this time so I can hopefully find it next time and not have to reinvent the wheel again.

Here is the one-liner I used to find the top IPs this time:

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c | egrep -v "\s([0-9]|[0-9][0-9]|[0-9][0-9][0-9]) "

Splitting this out we have:

  1. A call to <code>sed</code> to extract all the IP Addresses from the access_log file
  2. A call to <code>sort</code> to sort the list of IPs
  3. A call to <code>uniq</code> to create a list of unique IPs with counts
  4. A call to egrep to filter the unique list down to IPs we at least 1000 appearances – this will need tuning depending on the volume of requests / time period the file covers.