hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Majid Azimi <majid.merk...@gmail.com>
Subject Splitting SequenceFile in controlled manner
Date Tue, 06 Dec 2011 19:55:28 GMT
hadoop writes in a SequenceFile in in key-value pair(record) format.
Consider we have a large unbounded log file. Hadoop will split the file
based on block size and save them on multiple data nodes. Is it guaranteed
that each key-value pair will reside on a single block? or we may have a
case so that key is in one block on node 1 and value(or parts of it) on
second block on node 2? If we may have unmeaning-full splits, then what is
the solution? sync markers?

Another question is: Does hadoop automatically write sync markers or we
should write it manually?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message