hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yair gotdanker" <yair...@gmail.com>
Subject Hadoop - is it good for me and performance question
Date Sun, 29 Jun 2008 11:45:49 GMT
Hello all,

I am newbie to hadoop, The technology seems very interesting but I am not
sure it suit my needs.  I really appreciate your feedbacks.

The problem:

I have multiple logservers each receiving 10-100 mg/minute. The received
data is processed to produce aggregated data.
The data process time should take few minutes at top (10 min).

In addtion, I did some performance benchmark on the workcount example
provided by quickstart tutorial on my pc (pseudo-distributed, using
quickstart configurations file) and it took about 40 seconds!
I must be missing something here, I must be doing something wrong here since
40 seconds is way too long!
Map/reduce function should be very fast since there is almost no processing
done. So I guess most of the time spend on the hadoop framework.

I will appreciate any help  for understanding this and how can I increase
the performance.
Does anyone know good behind the schene tutorial, that explains more on how
the jobtracker/tasktracker communicate and so.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message