hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cdwillie76 <chris.d.willi...@gmail.com>
Subject Is hadoop right for my problem
Date Tue, 03 Feb 2009 14:34:56 GMT

I have an application I would like to apply hadoop to but I'm not sure if the
tasking is too small.  I have a file that contains between 70,000 - 400,000
records.  All the records can be processed in parallel and I can currently
process them at 400 records a second single threaded (give or take).  I
thought I read somewhere (one of the tutorials) that the mapper tasks should
run at least for a minute to offset the overhead in creating them.  Is this
really the case?  I am pretty sure that a one to one record to mapper is
overkill but I am wondering if I batching them up for the mapper is still a
way to go or if I should look at some other framework to help split up the

Any insight would be appreciated.

View this message in context: http://www.nabble.com/Is-hadoop-right-for-my-problem-tp21811122p21811122.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message