hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Coetzee <nicc...@gmail.com>
Subject Re: Just to say thanks
Date Thu, 09 Jul 2009 21:07:02 GMT
Hi

I basically followed the samples on:


   -
   http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
   -
   http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

The only difference was that I work on CentOS 5.3 64bit - but that did not
change much. I also used JDK 6 update 14 (
http://java.sun.com/javase/downloads/?intcmp=1281).

My scripts, however, is not yet Java based. In stead I used the streaming
solution and do my scripts in Perl. I used
http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Pythonas
guide (some modification was required, for example
"contrib/streaming/hadoop-0.19.1-streaming.jar" changed to
"contrib/streaming/hadoop-0.20.0-streaming.jar".

The rest of the config help I got from the documentation.

Lessons learned so far:


   - To make life simple, use identical setup on all nodes (data directory
   location, file locations etc.)
   - Scripts need to be present on all nodes (obvious, but I missed that)

Current benchmarks put my 3 node cluster at 3.2 times faster - and this also
takes into account the additional time it took to upload the data in Hadoop
and download the results again.

Hope that helps.

I am hoping to write a more general Apache log parser and share my
experiences. Just busy with the script fine tuning.

Cheers

Nico

PS: a little off topic, but I just need to ask quickly: do we reply top or
bottom on this list?



On Thu, Jul 9, 2009 at 7:25 PM, Phil Whelan <phil123@gmail.com> wrote:

> Hi Nico,
>
> It sounds like you are just ahead of me. We are looking at doing
> exactly the same thing right now. Any tips?
>
> Thanks,
> Phil
>
> On Thu, Jul 9, 2009 at 3:31 AM, Nico Coetzee<nicc777@gmail.com> wrote:
> > Hi,
> >
> > I found the Hadoop project only over the week-end and I have just moved
> one
> > of my logfile parsing jobs onto a 3 node cluster to play around and
> compare
> > results from our traditional methods.
> >
> > Wow !
> >
> > All I can say is thanks for all the people who contributed toward this
> > project. This must be the coolest tech I found this year so far.
> >
> > I have only scratched the surface so far but I am sure I will just like
> it
> > more and more as time goes by. My next objective is to really get into
> Hive.
> >
> > Cheers all and have fun!
> >
> > Nico
> >
>
>
>
> --
> Mobile: +1  778-233-4935
> Twitter: philwhln
> Email : phil123@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message