Archive for April, 2008

April 14, 2008

Hadoop Summit

hadoop-logo.jpg

Last month Yahoo! held the first Apache Hadoop Summit in Santa Clara, CA. I really wanted to go but had scheduled our family vacation to Austin, TX for that same week months before. Daniel was able to go in my place for Lookery and my friend Chris Gillett, who was on my team at Compete, also attended.

“Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.”
hadoop-architecture.gif

Hadoop implements Google’s MapReduce programming model to create a framework that breaks up large data into small chunks that are then processed in parallel across a cluster of commodity servers.

The framework is still at a very early stage but is already being used by Facebook, Google, Visible Measures, Yahoo, The New York Times and by us at Lookery.

Back when we started Compete in 2000 there were no Open Source options like Hadoop. Forgot about finding developers that had experience dealing with terabyte-scale data.

We ended up evaluating most of the supercomputing software that was being used mostly within government and academic settings at the time. Software like PBS, MPI, PVM, Torque and Condor where the state of the art at the time. The only option was to create our own solution for dealing with our massive clickstream “database”.

Here are the slides from the presentations Chris gave at the PyCon 2005 conference that describe some of the data processing apps we came up with at Compete.

Cool to see that the paradigms we used are being carried on with the Hadoop, PIG and HDFS projects just at a much larger scale.

Chris posted some great summaries of the Hadoop Summit on his blog. I hope he gets around to posting the summaries for the rest of the talks from that day.

Interested in Hadoop? Python? We’re looking for engineers at Lookery to work on our data processing cluster.

Learn more about working with Hadoop and BIG Data at Lookery.



April 10, 2008

Monitoring Nginx with Hyperic

nginx.gif

At Lookery we’ve been working at adding my favorite web server Nginx to our production stack.

If you haven’t caught the Nginx bug yet you’re missing out. Nginx is a super lightweight webserver that many use in front of their overweight Apache web servers to offload asset (images, css, js, etc) serving. It’s also a great front-end proxy to your Django or Rails stack.

A while back we settled on using Hyperic as our server monitoring and alerting platform. I spent many years using Ganglia, Munin, Cacti, Nagios and a couple of home-brewed monitoring solutions before I came across Hyperic while reading the comments on Barry’s Blog (Wordpress). Hyperic, despite being written in Java ;-) , has worked out well for us.

The only problem: Hyperic doesn’t ship a Nginx plugin and Google wasn’t able to find me one. So I asked Ashwin Phatak to create one for us. I finally got around to uploading it to Google Code tonight.

The plugin is very simple but if releasing it can save anyone some dev cycles we’ll be quite happy.

It’s the first of many code projects at Lookery we plan to open source.

Download the Hyperic Nginx Plugin Now


Content © 2007-2008 David Cancel. All Rights Reserved.