Every now and then I find myself needing to quickly analyse a set of access_log files to see who the most common visitors are so that I can decide if there are any abusers I should be blocking or poorly configured services running somewhere that I can try to get fixed. I can never remember the quickest was to do this so I decided to write down the “one liner” that I cobbled together this time so I can hopefully find it next time and not have to reinvent the wheel again.
Here is the one-liner I used to find the top IPs this time:
sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c | egrep -v "\s([0-9]|[0-9][0-9]|[0-9][0-9][0-9]) "
Splitting this out we have:
- A call to <code>sed</code> to extract all the IP Addresses from the access_log file
- A call to <code>sort</code> to sort the list of IPs
- A call to <code>uniq</code> to create a list of unique IPs with counts
- A call to
egrep
to filter the unique list down to IPs we at least 1000 appearances – this will need tuning depending on the volume of requests / time period the file covers.