hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Maillist <darkchanterl...@gmail.com>
Subject HDFS - many files, small size
Date Thu, 02 Oct 2014 08:12:11 GMT
Hi there
I got millions of rather small PDF-Files which I want to load into HDFS for
later analysis. Also I need to re-encode them as base64-stream to get the
MR-Job for parsing work.

Is there any better/faster method of just calling the 'put' function in a
huge (bash) loop? Maybe I could implement encoding and loading as an MR-Job

Second thing is, according to a cloudera blog I read, it's a bad idea to
store small files on HDFS, especially if there are large numbers of them.
They recommend HBase instead. However I want to take further action via

Thanks for your Suggestions

View raw message