hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Montenegro <aamon...@gmail.com>
Subject processing xml with headers
Date Thu, 26 Aug 2010 16:43:29 GMT
I have some XML files with a structure like this:


   <header>some text</header>

   <record>record 1</record>
   <record>record 2</record>
   <record>record N</record>


Where the info in the header is necessary for processing the records. By
using Mahou's  XmlInputFormat I'm able to rescue every <record> but not the
info in the header, an option is not to split the file and process it as a
whole in the mapper, but sometimes the files are over 200MB and I belive it
would not be very efficient.
So if any has some suggestion about how to process this kind of file, I
would appreciate it!

Alejandro Montenegro del Pino.
ViƱa del Mar - Chile
phone: (+56) 9-68358690

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message