hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Sierra" <m...@stuartsierra.com>
Subject Serialization format for structured data
Date Thu, 22 May 2008 20:54:33 GMT
Hello, I'm still getting my head around how Hadoop works.  A survey
question: what kind of serialization do you use to output structured
data from your map/reduce jobs?

When both key and value are primitive types, either TextOutputFormat
or SequenceFileOutputFormat is easy.  But what if you want to store a
more complex data structure as the value?

By "complex data structure," I mean some combination of lists/arrays,
dictionaries/hashes, and primitives (int, float, string, boolean,
null).

I've tried using JSON to store structured data in TextOutputFormat,
which works but is not very efficient.  Any better suggestions?

Thanks, all,
-Stuart

Mime
View raw message