hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Yadav <ri...@infoobjects.com>
Subject Re: Reading json format input
Date Thu, 30 May 2013 00:15:22 GMT
for that, you have to only write intermediate data if word = "text"

String[] words = line.split("\\W+");

 for (String word : words) {

    if (word.equals("text"))

          context.write(new Text(word), new IntWritable(1));

 }


I  am assuming you have huge volume of data for it, otherwise MapReduce
will be an overkill and simple regex will do.



On Wed, May 29, 2013 at 4:45 PM, jamal sasha <jamalshasha@gmail.com> wrote:

> Hi Rishi,
>    But I dont want the wordcount of all the words..
> In json, there is a field "text".. and those are the words I wish to count?
>
>
> On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav <rishi@infoobjects.com>wrote:
>
>> Hi Jamal,
>>
>> I took your input and put it in sample wordcount program and it's working
>> just fine and giving this output.
>>
>> author 3
>> foo234 1
>> text 3
>> foo 1
>> foo123 1
>> hello 3
>> this 1
>> world 2
>>
>>
>> When we split using
>>
>> String[] words = input.split("\\W+");
>>
>> it takes care of all non-alphanumeric characters.
>>
>> Thanks and Regards,
>>
>> Rishi Yadav
>>
>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalshasha@gmail.com>wrote:
>>
>>> Hi,
>>>    I am stuck again. :(
>>> My input data is in hdfs. I am again trying to do wordcount but there is
>>> slight difference.
>>> The data is in json format.
>>> So each line of data is:
>>>
>>> {"author":"foo", "text": "hello"}
>>> {"author":"foo123", "text": "hello world"}
>>> {"author":"foo234", "text": "hello this world"}
>>>
>>> So I want to do wordcount for text part.
>>> I understand that in mapper, I just have to pass this data as json and
>>> extract "text" and rest of the code is just the same but I am trying to
>>> switch from python to java hadoop.
>>> How do I do this.
>>> Thanks
>>>
>>
>>
>

Mime
View raw message