Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D51E11022D for ; Tue, 17 Dec 2013 09:25:12 +0000 (UTC) Received: (qmail 16289 invoked by uid 500); 17 Dec 2013 09:25:02 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 15933 invoked by uid 500); 17 Dec 2013 09:25:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15926 invoked by uid 99); 17 Dec 2013 09:25:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Dec 2013 09:25:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ranjinibecse@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Dec 2013 09:24:57 +0000 Received: by mail-la0-f45.google.com with SMTP id eh20so3145231lab.18 for ; Tue, 17 Dec 2013 01:24:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/9PEedqWTv3njtpRFPHF3vs/AHcaTJ3AnqPJCLj7l44=; b=rilF7QpOymtheExmNlaWkSNPo05F+jRmvDGXBYPe/DXjV2eIidJVlmuyQaMLYCJlVG KmmPH8WrcyGJlzQgDGWfOlXLD4k7kAjZhlATk00lCiG0N0TWJOMZFT01dnqZa7q2jV2H ucRdDnbfP997YTtfYW/qTuyHdGmCwGY8V47J09vTnj/FUNuJh1pgFNLQpyK3sd8ralco xdYpnF8h/j6nPZ7LJ4wWuJDIr4REzUjF+lDbZ2q2sjR/V04DlbRknuPL+oWFWQittlkU pWaaHp9WxEGnNgEvlhWzmwPCyw9EITnnLBS89EmYddQhhgfui7oMh/L17CpEfydXGzGi NeGg== MIME-Version: 1.0 X-Received: by 10.112.144.69 with SMTP id sk5mr799908lbb.44.1387272275425; Tue, 17 Dec 2013 01:24:35 -0800 (PST) Received: by 10.152.131.165 with HTTP; Tue, 17 Dec 2013 01:24:35 -0800 (PST) In-Reply-To: References: <0312D54A-7969-4230-982F-2F9B90B99BA5@datameer.com> Date: Tue, 17 Dec 2013 14:54:35 +0530 Message-ID: Subject: Re: Hadoop-MapReduce From: Ranjini Rathinam To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b3432f42efd8404edb77f86 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3432f42efd8404edb77f86 Content-Type: text/plain; charset=ISO-8859-1 Hi, I want to know , when should i use Mapper , Reduce and Combiner. What all methods are there in them. Please suggest for study in detail. As I am fresher . Thanks in advance Ranjini On Tue, Dec 17, 2013 at 2:34 PM, unmesha sreeveni wrote: > Ranjini can u pls check this. This is not perfect ..I simply did to > check my xml data. > > https://github.com/studhadoop/xmlparsing-hadoop/blob/master/XmlParser11.java > > > On Tue, Dec 17, 2013 at 2:26 PM, Ranjini Rathinam wrote: > >> Hi, >> >> The driver class and my Mapper class i have used >> org.apache.hadoop.mapreduce.lib >> >> and in the XmlInputFormat.java class also i have used the >> org.apache.hadoop.mapreduce.lib >> >> but still iam getting this error. >> >> Please suggest. >> >> Thanks in advance >> >> Ranjini >> >> On Tue, Dec 17, 2013 at 2:07 PM, Shekhar Sharma wrote: >> >>> Hello Ranjini, >>> This error will come when you use mix and match newer and older API.. >>> >>> You might have written program using newer API and the the XML input >>> format is using older api.. >>> The older api has package structure of org.apache.hadoop.mapred >>> >>> The newer api has package structure package of >>> org.apache.hadoop.mapreduce.lib >>> >>> Check out the XMLINputFormat.java, which package of FileInputFormat >>> they have used... >>> >>> >>> Regards, >>> Som Shekhar Sharma >>> +91-8197243810 >>> >>> >>> On Tue, Dec 17, 2013 at 12:55 PM, Ranjini Rathinam >>> wrote: >>> > Hi, >>> > >>> > I am using hadoop 0.20 version >>> > >>> > In that while exceuting the XmlInformat class >>> > I am getting the error as >>> > >>> > "Error: Found Class org.apache.hadoop.mapreduce.TaskAttemptContext, >>> but >>> > interface was excepted,." >>> > >>> > Please suggest to fix the error. >>> > >>> > Thanks in advance. >>> > >>> > Ranjini >>> > >>> > On Wed, Dec 11, 2013 at 12:30 PM, Ranjini Rathinam < >>> ranjinibecse@gmail.com> >>> > wrote: >>> >> >>> >> hi, >>> >> >>> >> I have fixed the error , the code is running fine, but this code just >>> >> split the part of the tag. >>> >> >>> >> i want to convert into text format so that i can load them into >>> tables of >>> >> hbase and hive. >>> >> >>> >> I have used the DOM Parser but this parser uses File as Object but >>> hdfs >>> >> uses FileSystem. >>> >> >>> >> Eg, >>> >> >>> >> File fXmlFile = new File("D:/elango/test.xml"); >>> >> >>> >> System.out.println(g); >>> >> DocumentBuilderFactory dbFactory = >>> DocumentBuilderFactory.newInstance(); >>> >> DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); >>> >> Document doc = dBuilder.parse(fXmlFile); >>> >> >>> >> >>> >> This cant be used as hdfs, because hdfs path is accessed through >>> >> FileSystem. >>> >> >>> >> I kindly request u to , Please suggest me to fix the above issue. >>> >> >>> >> Thanks in advance >>> >> >>> >> Ranjini R >>> >> >>> >> >>> >> >>> >> >>> >> On Tue, Dec 10, 2013 at 11:07 AM, Ranjini Rathinam >>> >> wrote: >>> >>> >>> >>> >>> >>> >>> >>> ---------- Forwarded message ---------- >>> >>> From: Shekhar Sharma >>> >>> Date: Mon, Dec 9, 2013 at 10:23 PM >>> >>> Subject: Re: Hadoop-MapReduce >>> >>> To: user@hadoop.apache.org >>> >>> Cc: ssanyal@datameer.com >>> >>> >>> >>> >>> >>> It does work i have used it long back.. >>> >>> >>> >>> BTW if it is not working, write the custom input format and implement >>> >>> your record reader. That would be far more easy than breaking your >>> >>> head with others code. >>> >>> >>> >>> Break your problem in step: >>> >>> >>> >>> (1) First the XML data is multiline...Meaning multiple lines makes a >>> >>> single record for you...May be a record for you would be >>> >>> >>> >>> >>> >>> x >>> >>> y >>> >>> >>> >>> >>> >>> (2) Implement a record reader that looks out for the starting and >>> >>> ending person tag ( Checkout how RecordReader.java is written) >>> >>> >>> >>> (3) Once you got the contents between starting and ending tag, now >>> you >>> >>> can use a xml parser to parse the contents into an java object and >>> >>> form your own key value pairs ( custom key and custom value) >>> >>> >>> >>> >>> >>> Hope you have enough pointers to write the code. >>> >>> >>> >>> >>> >>> Regards, >>> >>> Som Shekhar Sharma >>> >>> +91-8197243810 >>> >>> >>> >>> >>> >>> On Mon, Dec 9, 2013 at 6:30 PM, Ranjini Rathinam < >>> ranjinibecse@gmail.com> >>> >>> wrote: >>> >>> > Hi Subroto Sanyal, >>> >>> > >>> >>> > The link provided about xml, it does not work . The Class written >>> >>> > XmlContent is not allowed in the XmlInputFormat. >>> >>> > >>> >>> > I request you to help , whether this scenaio some one has coded, >>> and >>> >>> > needed >>> >>> > working code. >>> >>> > >>> >>> > I have written using SAX Parser too, but eventhough the jars are >>> added >>> >>> > in >>> >>> > classpath THe error is is coming has NoClasFoung Exception. >>> >>> > >>> >>> > Please provide sample code for the same. >>> >>> > >>> >>> > Thanks in advance, >>> >>> > Ranjini.R >>> >>> > >>> >>> > On Mon, Dec 9, 2013 at 12:34 PM, Ranjini Rathinam >>> >>> > >>> >>> > wrote: >>> >>> >> >>> >>> >> >>> >>> >>>> Hi, >>> >>> >>>> >>> >>> >>>> As suggest by the link below , i have used for my program , >>> >>> >>>> >>> >>> >>>> but i am facing the below issues, please help me to fix these >>> error. >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> XmlReader.java:8: XmlReader.Map is not abstract and does not >>> >>> >>>> override >>> >>> >>>> abstract method >>> >>> >>>> >>> >>> >>>> >>> map(org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter) >>> >>> >>>> in org.apache.hadoop.mapred.Mapper >>> >>> >>>> public static class Map extends MapReduceBase implements Mapper >>> >>> >>>> { >>> >>> >>>> ^ >>> >>> >>>> ./XmlInputFormat.java:16: XmlInputFormat.XmlRecordReader is not >>> >>> >>>> abstract >>> >>> >>>> and does not override abstract method >>> >>> >>>> next(java.lang.Object,java.lang.Object) in >>> >>> >>>> org.apache.hadoop.mapred.RecordReader >>> >>> >>>> public class XmlRecordReader implements RecordReader { >>> >>> >>>> ^ >>> >>> >>>> Note: XmlReader.java uses unchecked or unsafe operations. >>> >>> >>>> Note: Recompile with -Xlint:unchecked for details. >>> >>> >>>> 2 errors >>> >>> >>>> >>> >>> >>>> >>> >>> >>>> i am using hadoop 0.20 version and java 1.6 . >>> >>> >>>> >>> >>> >>>> Please suggest. >>> >>> >>>> >>> >>> >>>> Thanks in advance. >>> >>> >>>> >>> >>> >>>> Regrads, >>> >>> >>>> Ranjini. R >>> >>> >>>> On Mon, Dec 9, 2013 at 11:08 AM, Ranjini Rathinam >>> >>> >>>> wrote: >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> ---------- Forwarded message ---------- >>> >>> >>>>> From: Subroto >>> >>> >>>>> Date: Fri, Dec 6, 2013 at 4:42 PM >>> >>> >>>>> Subject: Re: Hadoop-MapReduce >>> >>> >>>>> To: user@hadoop.apache.org >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> Hi Ranjini, >>> >>> >>>>> >>> >>> >>>>> A good example to look into : >>> >>> >>>>> http://www.undercloud.org/?p=408 >>> >>> >>>>> >>> >>> >>>>> Cheers, >>> >>> >>>>> Subroto Sanyal >>> >>> >>>>> >>> >>> >>>>> On Dec 6, 2013, at 12:02 PM, Ranjini Rathinam wrote: >>> >>> >>>>> >>> >>> >>>>> Hi, >>> >>> >>>>> >>> >>> >>>>> How to read xml file via mapreduce and load them in hbase and >>> hive >>> >>> >>>>> using java. >>> >>> >>>>> >>> >>> >>>>> Please provide sample code. >>> >>> >>>>> >>> >>> >>>>> I am using hadoop 0.20 version and java 1.6. Which parser >>> version >>> >>> >>>>> should be used. >>> >>> >>>>> >>> >>> >>>>> Thanks in advance. >>> >>> >>>>> >>> >>> >>>>> Ranjini >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>> >>> >>> >>> >>> >>> >> >>> >>> > >>> >>> >>> >> >>> > >>> >> >> > > > -- > *Thanks & Regards* > > Unmesha Sreeveni U.B > > *Junior Developer* > > > --047d7b3432f42efd8404edb77f86 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
=A0
I want to know , when should i use Mapper , Reduce and Combiner.
=A0
What all methods are there in them.
=A0
Please suggest for study in detail. As I am fresher .
=A0
Thanks in advance
=A0
Ranjini

On Tue, Dec 17, 2013 at 2:34 PM, unmesha sreeven= i <unmeshabiju@gmail.com> wrote:
Ranji= ni can u pls check this. This is not perfect ..I simply did to check my xml= data.


On Tue, Dec 17, 2013 at 2:26 PM, Ranjini Rathina= m <ranjinibecse@gmail.com> wrote:
Hi,
=A0
The driver class and my Mapper class i have used
org.apache.hadoop.mapreduce.lib

and in the XmlInputFormat.java = class also i have used the org.apache.hadoop.mapreduce.lib

but still= iam getting this error.
=A0
Please suggest.
=A0
Thanks in advance
=A0
Ranjini

On Tue, Dec 17, 2013 at 2:07 PM, Shekhar Sharma = <shekhar2581@gmail.com> wrote:
Hello Ranjini,
This error will com= e when you use mix and match newer and older API..

You might have wr= itten program using newer API and the the XML input
format is using older api..
The older api has package structure of org.a= pache.hadoop.mapred

The newer api has package structure package of o= rg.apache.hadoop.mapreduce.lib

Check out the XMLINputFormat.java, wh= ich package of FileInputFormat
they have used...


Regards,
Som Shekhar Sharma
+91-8197243810


On Tue, Dec 17, 2013 at 12:55 PM, Ranjini Rathinam
<ranjin= ibecse@gmail.com> wrote:
> Hi,
>
> I am using hado= op 0.20 version
>
> In that while exceuting the XmlInformat cla= ss
> I am getting the error as
>
> "Error: Found Class =A0= org.apache.hadoop.mapreduce.TaskAttemptContext, but
> interface was e= xcepted,."
>
> Please suggest to fix the error.
> > Thanks in advance.
>
> Ranjini
>
> On Wed, Dec= 11, 2013 at 12:30 PM, Ranjini Rathinam <ranjinibecse@gmail.com>
> wrote:<= br> >>
>> hi,
>>
>> I have fixed the error , t= he code is running fine, but this code just
>> split the part of t= he tag.
>>
>> i want to convert into text format so that = i can load them into tables of
>> hbase and hive.
>>
>> I have used the DOM Parser= but this parser uses File as Object =A0but hdfs
>> uses FileSyste= m.
>>
>> Eg,
>>
>> File fXmlFile =3D ne= w File("D:/elango/test.xml");
>>
>> =A0System.out.println(g);
>> =A0DocumentBuild= erFactory dbFactory =3D DocumentBuilderFactory.newInstance();
>> = =A0DocumentBuilder dBuilder =3D dbFactory.newDocumentBuilder();
>>= =A0Document doc =3D dBuilder.parse(fXmlFile);
>>
>>
>> This cant be used as hdfs, because hdfs pa= th =A0is accessed through
>> FileSystem.
>>
>> I= kindly request u to , Please suggest me to fix the above issue.
>>= ;
>> Thanks in advance
>>
>> Ranjini R
>>>>
>>
>>
>> On Tue, Dec 10, 2013 at 11:07= AM, Ranjini Rathinam
>> <ranjinibecse@gmail.com> wrote:
>>>
>>>
>>>
>>> ---------- For= warded message ----------
>>> From: Shekhar Sharma <shekhar2581@gmail.com>
>>> Date: Mon, Dec 9, 2013 at 10:23 PM
>>> Subject: Re= : Hadoop-MapReduce
>>> To:
user@hadoop.apache.org
>>> Cc: ssanyal@datameer.com<= /a>
>>>
>>>
>>> It does work i have used it lo= ng back..
>>>
>>> BTW if it is not working, write t= he custom input format and implement
>>> your record reader. Th= at would be far more easy than breaking your
>>> head with others code.
>>>
>>> Break y= our problem in step:
>>>
>>> (1) First the XML data= is multiline...Meaning multiple lines makes a
>>> single recor= d for you...May be a record for you would be
>>>
>>> <person>
>>> =A0<fname>= ;x</fname>
>>> =A0 <lname>y</lname>
>&g= t;> </person>
>>>
>>> (2) Implement a reco= rd reader that looks out for the starting and
>>> ending person tag ( Checkout how RecordReader.java is written)=
>>>
>>> (3) Once you got the contents between star= ting and ending tag, now you
>>> can use a xml parser to parse = the contents into an java object and
>>> form your own key value pairs ( custom key and custom value)>>>
>>>
>>> Hope you have enough pointer= s to write the code.
>>>
>>>
>>> Regard= s,
>>> Som Shekhar Sharma
>>> +91-8197243810
>>&= gt;
>>>
>>> On Mon, Dec 9, 2013 at 6:30 PM, Ranjini= Rathinam <
r= anjinibecse@gmail.com>
>>> wrote:
>>> > Hi Subroto Sanyal,
>>>= >
>>> > The link =A0provided about xml, it does not work= . The Class written
>>> > XmlContent is not allowed in the = XmlInputFormat.
>>> >
>>> > I request you to help , whether this= scenaio some one has coded, and
>>> > needed
>>>= ; > working code.
>>> >
>>> > I have writt= en using SAX Parser too, but eventhough the jars are added
>>> > in
>>> > classpath THe error is is coming = has NoClasFoung Exception.
>>> >
>>> > Please= provide sample code for the same.
>>> >
>>> >= ; Thanks in advance,
>>> > Ranjini.R
>>> >
>>> > On Mo= n, Dec 9, 2013 at 12:34 PM, Ranjini Rathinam
>>> > <ranjinibecse@gmail.c= om>
>>> > wrote:
>>> >>
>>> >><= br>>>> >>>> Hi,
>>> >>>>
&g= t;>> >>>> As suggest by the link below , i have used for = my program ,
>>> >>>>
>>> >>>> but i am fac= ing the below issues, please help me to fix these error.
>>> &g= t;>>>
>>> >>>>
>>> >>>= ;> XmlReader.java:8: XmlReader.Map is not abstract and does not
>>> >>>> override
>>> >>>> abs= tract method
>>> >>>>
>>> >>>&= gt; map(org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apa= che.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.h= adoop.io.Text>,org.apache.hadoop.mapred.Reporter)
>>> >>>> in org.apache.hadoop.mapred.Mapper
>>= ;> >>>> =A0public static class Map extends MapReduceBase imp= lements Mapper
>>> >>>> <LongWritable, Text, Tex= t, Text> {
>>> >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^
>>&= gt; >>>> ./XmlInputFormat.java:16: XmlInputFormat.XmlRecordRead= er is not
>>> >>>> abstract
>>> >>= ;>> and does not override abstract method
>>> >>>> next(java.lang.Object,java.lang.Object) in>>> >>>> org.apache.hadoop.mapred.RecordReader
>= ;>> >>>> public class XmlRecordReader implements RecordRe= ader {
>>> >>>> =A0 =A0 =A0 =A0^
>>> >>>= > Note: XmlReader.java uses unchecked or unsafe operations.
>>&= gt; >>>> Note: Recompile with -Xlint:unchecked for details.
= >>> >>>> 2 errors
>>> >>>>
>>> >>>>
>>&= gt; >>>> i am using hadoop 0.20 version and java 1.6 .
>&= gt;> >>>>
>>> >>>> Please suggest. >>> >>>>
>>> >>>> Thanks in ad= vance.
>>> >>>>
>>> >>>> Re= grads,
>>> >>>> Ranjini. R
>>> >>= >> On Mon, Dec 9, 2013 at 11:08 AM, Ranjini Rathinam
>>> >>>> <ranjinibecse@gmail.com> wrote:
>>> &g= t;>>>>
>>> >>>>>
>>> >= ;>>>>
>>> >>>>> ---------- Forwarded message ---------->>> >>>>> From: Subroto <ssanyal@datameer.com>
>&g= t;> >>>>> Date: Fri, Dec 6, 2013 at 4:42 PM
>>> >>>>> Subject: Re: Hadoop-MapReduce
>>= > >>>>> To: user@hadoop.apache.org
>>> >>>>&= gt;
>>> >>>>>
>>> >>>>> Hi R= anjini,
>>> >>>>>
>>> >>>&g= t;> A good example to look into :
>>> >>>>> <= a href=3D"http://www.undercloud.org/?p=3D408" target=3D"_blank">http://www.= undercloud.org/?p=3D408
>>> >>>>>
>>> >>>>> Chee= rs,
>>> >>>>> Subroto Sanyal
>>> >= ;>>>>
>>> >>>>> On Dec 6, 2013, at 1= 2:02 PM, Ranjini Rathinam wrote:
>>> >>>>>
>>> >>>>> Hi,<= br>>>> >>>>>
>>> >>>>> H= ow to read xml file via mapreduce and load them in hbase and hive
>&g= t;> >>>>> using java.
>>> >>>>>
>>> >>>>> Plea= se provide sample code.
>>> >>>>>
>>>= ; >>>>> I am using hadoop 0.20 version and java 1.6. Which p= arser version
>>> >>>>> should be used.
>>> >>&= gt;>>
>>> >>>>> Thanks in advance.
>= >> >>>>>
>>> >>>>> Ranjini<= br> >>> >>>>>
>>> >>>>>
&= gt;>> >>>>>
>>> >>>>
>&g= t;> >>>
>>> >>
>>> >
>&g= t;>
>>
>




-= -
Thanks & Regards= =20

Unmesha Sreeveni U.B
Junior Developer


<= /font>

--047d7b3432f42efd8404edb77f86--