hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cubic <cubicdes...@gmail.com>
Subject Processing 10MB files in Hadoop
Date Thu, 26 Nov 2009 12:02:20 GMT
Hi list.

I have small files containing data that has to be processed. A file
can be small, even down to 10MB (but it can me also 100-600MB large)
and contains at least 30000 records to be processed.
Processing one record can take 30 seconds to 2 minutes. My cluster is
about 10 nodes. Each node has 16 cores.

Anybody can give an idea about how to deal with these small files? It
is not quite a common Hadoop task; I know. For example, how many map
tasks should I set in this case?

View raw message