hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Map/Reduce with XML files ..
Date Tue, 29 Apr 2008 16:07:32 GMT

Just adapt TextInput format so that it reads to the next file boundary
instead of the next new line.

There is also a jira out for file archiving that would do all of this (and
more) for you.  If you don't want to wait, then the mod to TIF is pretty
easy.


On 4/28/08 5:14 PM, "Kayla Jay" <kaylais30@yahoo.com> wrote:

> Yes, I'm talking about a collection of small xml files stored in "container"
> files.  I.e there's a lot and lots of small xml files collected into big
> files.  Not one gargantuan XML file. How would you go about using hadoop with
> splits and processing and handling these sorts of XML files?
> 
> 
> ----- Original Message ----
> From: Ted Dunning <tdunning@veoh.com>
> To: core-user@hadoop.apache.org
> Sent: Monday, April 28, 2008 4:16:20 PM
> Subject: Re: Map/Reduce with XML files ..
> 
> 
> The only real problem with xml and map-reduce is if you are talking about
> one gargantuan XML file.  That makes correct splitting difficult.
> 
> If you are talking about millions or billions of small xml files (stored in
> some sort of container file), then hadoop should be pretty easy to use.
> 
> 
> On 4/28/08 9:39 AM, "Kayla Jay" <kaylais30@yahoo.com> wrote:
> 
>> Hello
>> 
>> Has anyone had any experience with processing xml files within Hadoop within
>> their maps/reduces?
>> In particular, has anyone used any sort of XQuery/XPath processing within
>> their maps/reduces?
>> Say I have XML string passed to the map and now I want to find something in
>> particular via XQuery/XPath or some sort to run numbers on occurrences or
>> parse out a particular section within the XML.
>> 
>> Anyone done any XML processing looking for things within XML?  Then,
>> aggregate
>> common pieces together in the reduces ?
>> 
>> 
>> On another note,
>> Has anyone figured out splits for XML files?
>> Has anyone written a custom XML reader other than the StreamXmlRecordReader?
>> The only one I've read about and can find anything is:
>> http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html
>> 
>> 
>> Thanks.
>> 
>> 
>> 
>>      
>> 
_____________________________________________________________________________>>
_
>> ______
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile.  Try it now.
>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> 
>       
> ______________________________________________________________________________
> ______
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Mime
View raw message