hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Map/Reduce with XML files ..
Date Tue, 29 Apr 2008 16:30:10 GMT


https://issues.apache.org/jira/browse/HADOOP-3307


On 4/29/08 9:25 AM, "Kayla Jay" <kaylais30@yahoo.com> wrote:

> Thanks.  Do you have the jira issue number for that so that I can keep an eye
> out on it?
> 
> Thanks.
> 
> 
> ----- Original Message ----
> From: Ted Dunning <tdunning@veoh.com>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, April 29, 2008 12:07:32 PM
> Subject: Re: Map/Reduce with XML files ..
> 
> 
> Just adapt TextInput format so that it reads to the next file boundary
> instead of the next new line.
> 
> There is also a jira out for file archiving that would do all of this (and
> more) for you.  If you don't want to wait, then the mod to TIF is pretty
> easy.
> 
> 
> On 4/28/08 5:14 PM, "Kayla Jay" <kaylais30@yahoo.com> wrote:
> 
>> Yes, I'm talking about a collection of small xml files stored in "container"
>> files.  I.e there's a lot and lots of small xml files collected into big
>> files.  Not one gargantuan XML file. How would you go about using hadoop with
>> splits and processing and handling these sorts of XML files?
>> 
>> 
>> ----- Original Message ----
>> From: Ted Dunning <tdunning@veoh.com>
>> To: core-user@hadoop.apache.org
>> Sent: Monday, April 28, 2008 4:16:20 PM
>> Subject: Re: Map/Reduce with XML files ..
>> 
>> 
>> The only real problem with xml and map-reduce is if you are talking about
>> one gargantuan XML file.  That makes correct splitting difficult.
>> 
>> If you are talking about millions or billions of small xml files (stored in
>> some sort of container file), then hadoop should be pretty easy to use.
>> 
>> 
>> On 4/28/08 9:39 AM, "Kayla Jay" <kaylais30@yahoo.com> wrote:
>> 
>>> Hello
>>> 
>>> Has anyone had any experience with processing xml files within Hadoop within
>>> their maps/reduces?
>>> In particular, has anyone used any sort of XQuery/XPath processing within
>>> their maps/reduces?
>>> Say I have XML string passed to the map and now I want to find something in
>>> particular via XQuery/XPath or some sort to run numbers on occurrences or
>>> parse out a particular section within the XML.
>>> 
>>> Anyone done any XML processing looking for things within XML?  Then,
>>> aggregate
>>> common pieces together in the reduces ?
>>> 
>>> 
>>> On another note,
>>> Has anyone figured out splits for XML files?
>>> Has anyone written a custom XML reader other than the StreamXmlRecordReader?
>>> The only one I've read about and can find anything is:
>>> http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html
>>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>>      
>>> 
> 
_____________________________________________________________________________>>
>
> _
>>> ______
>>> Be a better friend, newshound, and
>>> know-it-all with Yahoo! Mobile.  Try it now.
>>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>> 
>> 
>>      
>> 
_____________________________________________________________________________>>
_
>> ______
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile.  Try it now.
>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> 
>       
> ______________________________________________________________________________
> ______
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Mime
View raw message