hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junxian Yan <junxian....@gmail.com>
Subject question about number of map tasks for small file
Date Tue, 31 May 2011 09:55:33 GMT
Hi Guys

I use flume to store log file , and use hive to query.

Flume always store the small file with suffix .seq Now I have over 35
thousand seq files. Every time when I launch query script, 35 thousand map
tasks will be created and it's so long time to wait for completing.

I also try to set CombineHiveInputFormat, but if I set this option, it seems
the task will be executed slowly. Because total size of the data folder over
700M.  Now in my testing env, I only have 3 data nodes. I also tried to add
mapred.map.tasks=5 after the CombineHiveInputFormat setting, seems doesn't
work. There's alway only one map task if set CombineHiveInputFormat.

Can you plz show me a solution in which I can set map task number freely

BTW: version for hadoop is 20 and hive is 0.5

Richard

Mime
View raw message