Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CACb0Fn43YyU=7VraanRHkhq09voTEoaicT2OcHEZ=tc5o6p_2Q@mail.gmail.com>
References: 
 <CACb0Fn43YyU=7VraanRHkhq09voTEoaicT2OcHEZ=tc5o6p_2Q@mail.gmail.com>
Date: Wed, 29 May 2013 16:43:24 -0700
Message-ID: 
 <CAAE8jdeOciZJfXn27J+iGLOPJRpoxG86kixo=9eF31kE=gEK1Q@mail.gmail.com>
Subject: Re: Reading json format input
From: Rishi Yadav <rishi@infoobjects.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c2d6829cbb2e04dde3f21f

--001a11c2d6829cbb2e04dde3f21f
Content-Type: text/plain; charset=ISO-8859-1

Hi Jamal,

I took your input and put it in sample wordcount program and it's working
just fine and giving this output.

author 3
foo234 1
text 3
foo 1
foo123 1
hello 3
this 1
world 2


When we split using

String[] words = input.split("\\W+");

it takes care of all non-alphanumeric characters.

Thanks and Regards,

Rishi Yadav

On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalshasha@gmail.com> wrote:

> Hi,
>    I am stuck again. :(
> My input data is in hdfs. I am again trying to do wordcount but there is
> slight difference.
> The data is in json format.
> So each line of data is:
>
> {"author":"foo", "text": "hello"}
> {"author":"foo123", "text": "hello world"}
> {"author":"foo234", "text": "hello this world"}
>
> So I want to do wordcount for text part.
> I understand that in mapper, I just have to pass this data as json and
> extract "text" and rest of the code is just the same but I am trying to
> switch from python to java hadoop.
> How do I do this.
> Thanks
>

--001a11c2d6829cbb2e04dde3f21f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Jamal,<div><br></div><div>I took your input and put it =
in sample wordcount program and it&#39;s working just fine and giving this =
output.=A0</div><div><br></div><div><div>author<span class=3D"" style=3D"wh=
ite-space:pre">	</span>3</div>
<div>foo234<span class=3D"" style=3D"white-space:pre">	</span>1</div><div>t=
ext<span class=3D"" style=3D"white-space:pre">	</span>3</div><div>foo<span =
class=3D"" style=3D"white-space:pre">	</span>1<br></div><div>foo123<span cl=
ass=3D"" style=3D"white-space:pre">	</span>1</div>
<div>hello<span class=3D"" style=3D"white-space:pre">	</span>3</div><div>th=
is<span class=3D"" style=3D"white-space:pre">	</span>1</div><div>world<span=
 class=3D"" style=3D"white-space:pre">	</span>2</div></div><div><br></div><=
div><br>
</div><div>When we split using</div><div><br></div><div style>String[] word=
s =3D input.split(&quot;\\W+&quot;);</div><div style><br></div><div style>i=
t takes care of all non-alphanumeric characters.</div><div class=3D"gmail_e=
xtra">
<br clear=3D"all"><div>


<p><font size=3D"1">Thanks and Regards,</font></p><p><font size=3D"1">Rishi=
 Yadav</font></p></div><br><div class=3D"gmail_quote">On Wed, May 29, 2013 =
at 2:54 PM, jamal sasha <span dir=3D"ltr">&lt;<a href=3D"mailto:jamalshasha=
@gmail.com" target=3D"_blank">jamalshasha@gmail.com</a>&gt;</span> wrote:<b=
r>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div>=A0 =A0I am stuck a=
gain. :(</div><div>My input data is in hdfs. I am again trying to do wordco=
unt but there is slight difference.</div>
<div>The data is in json format.</div><div>So each line of data is:</div>
<div><br></div><div>{&quot;author&quot;:&quot;foo&quot;, &quot;text&quot;: =
&quot;hello&quot;}</div><div>{&quot;author&quot;:&quot;foo123&quot;, &quot;=
text&quot;: &quot;hello world&quot;}<br></div><div>
{&quot;author&quot;:&quot;foo234&quot;, &quot;text&quot;: &quot;hello this =
world&quot;}<br></div><div><br></div><div>So I want to do wordcount for tex=
t part.</div><div>I understand that in mapper, I just have to pass this dat=
a as json and extract &quot;text&quot; and rest of the code is just the sam=
e but I am trying to switch from python to java hadoop.=A0</div>

<div>How do I do this.</div><div>Thanks</div></div>
</blockquote></div><br></div></div>

--001a11c2d6829cbb2e04dde3f21f--