hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 胡斐 <hufe...@gmail.com>
Subject Custom FileInputFormat.class
Date Mon, 01 Dec 2014 16:38:01 GMT

I want to custom FileInputFormat.class. In order to determine which host
the specific part of a file belongs to, I need to open the file in HDFS and
read some information. It will take me nearly 500ms to open a file and get
the information I need. But now I have thousands of files to deal with, so
it would be a long time if I deal with all of them as the above.

Is there any better solution to reduce the time when the number of files is

Thanks in advance!

View raw message