hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Memory problems with BytesWritable and huge binary files
Date Fri, 24 Jan 2014 18:24:59 GMT
Is your data in any given file a bunch of key-value pairs? If that isn't
the case, I'm wondering how writing a single large key-value into a
sequence file helps. It won't. May be you can give an example of your input
data?

If indeed they are a bunch of smaller sized key-value pairs, you can write
your own custom InputFormat that reads the data from your input files one
k-v pair after another, and feed it to your MR job. There isn't any need
for converting them to sequence-files at that point.

Thanks
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message