Finding “popular” IP addresses in access_log files

Every now and then I find myself needing to quickly analyse a set of access_log files to see who the most common visitors are so that I can decide if there are any abusers I should be blocking or poorly configured services running somewhere that I can try to get fixed.  I can never remember the quickest was to do this so I decided to write down the “one liner” that I cobbled together this time so I can hopefully find it next time and not have to reinvent the wheel again.

Here is the one-liner I used to find the top IPs this time:

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c | egrep -v "\s([0-9]|[0-9][0-9]|[0-9][0-9][0-9]) "

Splitting this out we have:

  1. A call to <code>sed</code> to extract all the IP Addresses from the access_log file
  2. A call to <code>sort</code> to sort the list of IPs
  3. A call to <code>uniq</code> to create a list of unique IPs with counts
  4. A call to egrep to filter the unique list down to IPs we at least 1000 appearances – this will need tuning depending on the volume of requests / time period the file covers.

svn spelunking

As I sit here blaming code to find the original source of a line of code I’m beginning to think that svn needs a new improved version of blame called spelunk it would work something like this:

$ svn help spelunk

spelunk (curse, showup, show): Output the content of specified files or URLs with the original revision and author information in-line ignoring white space changes and following movement of code between files.

Yes – I know I am dreaming 😉

Understanding complex RegEx

I’ve been wondering for a while if there was a good way of reverse engineering the meaning/function from a complex Regular Expression pattern such as the one used in the make_clickable function in WordPress.  This morning while debugging an issue with this function causing occasional segfaults in php I started searching around for a suitable tool and found YAPE::Regex::Explain to be the only reasonable solution.

Continue reading “Understanding complex RegEx”

Always show admin bar

I like the admin bar that we are adding in the soon to be released WordPress 3.1 so much that I wanted it to always show on my site.  This way you get an easy to use search box on every page even when logged out.

I wrote a quick plugin file which I dropped into the wp-content/mu-plugins folder on this site. Here is the code I used in case you want to do the same:

function pjw_login_adminbar( $wp_admin_bar) {
 if ( !is_user_logged_in() )
 $wp_admin_bar->add_menu( array( 'title' => __( 'Log In' ), 'href' => wp_login_url() ) );
add_action( 'admin_bar_menu', 'pjw_login_adminbar' );
add_filter( 'show_admin_bar', '__return_true' , 1000 );

As you can see to make it even more useful I’ve added a Log In link so I can use it to log in to the site.

As with anything else to do with the Admin bar this code requires WordPress 3.1 to work.