hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamal sasha <jamalsha...@gmail.com>
Subject Re: Reading json format input
Date Thu, 30 May 2013 18:43:48 GMT
Hi Thanks guys.
 I figured out the issue. Hence i have another question.
I am using a third party library and I thought that once I have created the
jar file I dont need to specify the dependancies but aparently thats not
the case. (error below)
Very very naive question...probably stupid. How do i specify third party
libraries (jar) in hadoop.

Error:
Error: java.lang.ClassNotFoundException: org.json.JSONException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
at
org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



On Thu, May 30, 2013 at 2:02 AM, Pramod N <npramod05@gmail.com> wrote:

> Whatever you are trying to do should work,
> Here is the modified WordCount Map
>
>
>     public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {        String line = value.toString();
>
>         JSONObject line_as_json = new JSONObject(line);
>         String text = line_as_json.getString("text");
>         StringTokenizer tokenizer = new StringTokenizer(text);        while (tokenizer.hasMoreTokens())
{            word.set(tokenizer.nextToken());            context.write(word, one);       
}    }
>
>
>
>
>
> Pramod N <http://atmachinelearner.blogspot.in>
> Bruce Wayne of web
> @machinelearner <https://twitter.com/machinelearner>
>
> --
>
>
> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Whatever you have mentioned Jamal should work.you can debug this.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha <jamalshasha@gmail.com>wrote:
>>
>>> Hi,
>>>   For some reason, this have to be in java :(
>>> I am trying to use org.json library, something like (in mapper)
>>> JSONObject jsn = new JSONObject(value.toString());
>>>
>>> String text = (String) jsn.get("text");
>>> StringTokenizer itr = new StringTokenizer(text);
>>>
>>> But its not working :(
>>> It would be better to get this thing properly but I wouldnt mind using a
>>> hack as well :)
>>>
>>>
>>> On Wed, May 29, 2013 at 4:30 PM, Michael Segel <
>>> michael_segel@hotmail.com> wrote:
>>>
>>>> Yeah,
>>>> I have to agree w Russell. Pig is definitely the way to go on this.
>>>>
>>>> If you want to do it as a Java program you will have to do some work on
>>>> the input string but it too should be trivial.
>>>> How formal do you want to go?
>>>> Do you want to strip it down or just find the quote after the text
>>>> part?
>>>>
>>>>
>>>> On May 29, 2013, at 5:13 PM, Russell Jurney <russell.jurney@gmail.com>
>>>> wrote:
>>>>
>>>> Seriously consider Pig (free answer, 4 LOC):
>>>>
>>>> my_data = LOAD 'my_data.json' USING
>>>> com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
>>>> words = FOREACH my_data GENERATE $0#'author' as author,
>>>> FLATTEN(TOKENIZE($0#'text')) as word;
>>>> word_counts = FOREACH (GROUP words BY word) GENERATE group AS word,
>>>> COUNT_STAR(words) AS word_count;
>>>> STORE word_counts INTO '/tmp/word_counts.txt';
>>>>
>>>> It will be faster than the Java you'll likely write.
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 2:54 PM, jamal sasha <jamalshasha@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>    I am stuck again. :(
>>>>> My input data is in hdfs. I am again trying to do wordcount but there
>>>>> is slight difference.
>>>>> The data is in json format.
>>>>> So each line of data is:
>>>>>
>>>>> {"author":"foo", "text": "hello"}
>>>>> {"author":"foo123", "text": "hello world"}
>>>>> {"author":"foo234", "text": "hello this world"}
>>>>>
>>>>> So I want to do wordcount for text part.
>>>>> I understand that in mapper, I just have to pass this data as json and
>>>>> extract "text" and rest of the code is just the same but I am trying
to
>>>>> switch from python to java hadoop.
>>>>> How do I do this.
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>> datasyndrome.com
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message