hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Sierra" <m...@stuartsierra.com>
Subject Serialization format for structured data
Date Thu, 22 May 2008 20:54:33 GMT
Hello, I'm still getting my head around how Hadoop works.  A survey
question: what kind of serialization do you use to output structured
data from your map/reduce jobs?

When both key and value are primitive types, either TextOutputFormat
or SequenceFileOutputFormat is easy.  But what if you want to store a
more complex data structure as the value?

By "complex data structure," I mean some combination of lists/arrays,
dictionaries/hashes, and primitives (int, float, string, boolean,

I've tried using JSON to store structured data in TextOutputFormat,
which works but is not very efficient.  Any better suggestions?

Thanks, all,

View raw message