hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Gao <steve....@yahoo.com>
Subject What if an XML file cross boundary of HDFS chunks?
Date Thu, 29 Oct 2009 20:32:21 GMT

Does anybody have the similar issue? If you store XML files in HDFS, how can you make sure
a chunk reads by a mapper does not contain partial data of an XML segment?

For example:

<title>
<book>book1</book>
<author>me</author>
..............what if this is the boundary of a chunk?...................
<year>2009</year>
<book>book2</book>

<author>me</author>

<year>2009</year>
<book>book3</book>

<author>me</author>

<year>2009</year>
<title>



      


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message