hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Vivancos <pedro.vivan...@vocali.net>
Subject How to improve my map & reduce application
Date Fri, 27 Feb 2009 18:51:49 GMT
Dear friends,

I am new at Hadoop and I must say I just want to use it as a map & reduce
framework.

I've developed an application to be run in a server with 8 CPU and
everything seems to work properly but the performance. It doesn't use all
the CPU power.

I'm trying to process 200.000 documents and get some annotations of each
document (first and last names in the mapper) and merge it in the reduce
task (if I find a first name and a last name together => a name).

I've developed my own record reader because I want to get the URI of each
document I process. So for that record reader I have the URI as key and the
content as value. Here is the most import method (in my opinion):

I also must say that I'm not running the applications by using the
bin/hadoop script but using java command directly because I wasn't able to
do it.

So could you help me to use all the power of my CPU?

Thanks in advance.
Pedro

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message