Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of goksron@gmail.com designates
 209.85.160.45 as permitted sender)
Message-ID: <51B21579.6080901@gmail.com>
Date: Fri, 07 Jun 2013 10:16:41 -0700
From: Lance Norskog <goksron@gmail.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130308 Thunderbird/17.0.4
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: Mapreduce using JSONObjects
References: 
 <CALcDHcN+6QioTVpa-DGYk3pDyxOUdxax2cwfZ73WG3C33_uZFA@mail.gmail.com>
 <6E439B88-C52C-4682-B4D0-1D1F90EF1208@mmt.me.uk>
 <CAO7hTbNFNujB1NfxUfRBMhTtLeyVLhSFuV5SvLRVwJ1jpAyJEg@mail.gmail.com>
 <CALcDHcM5wRnF0z8e2fGdxkv6KH0Lq6kz2ynTnTwdvJ5=CTcBwQ@mail.gmail.com>
In-Reply-To: 
 <CALcDHcM5wRnF0z8e2fGdxkv6KH0Lq6kz2ynTnTwdvJ5=CTcBwQ@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------030306080303050009080902"

This is a multi-part message in MIME format.
--------------030306080303050009080902
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

A side point for Hadoop experts: a comparator is used for sorting in the 
shuffle. If a comparator always returns -1 for unequal objects, then 
sorting will take longer than it should because there will be a certain 
amount of items that are compared more than once.

Is this true?

On 06/05/2013 04:10 PM, Max Lebedev wrote:
>
> I�ve taken your advice and made a wrapper class which implements 
> WritableComparable. Thank you very much for your help. I believe 
> everything is working fine on that front. I used google�s gson for the 
> comparison.
>
>
> public int compareTo(Object o) {
>
>     JsonElement o1 = PARSER.parse(this.json.toString());
>
>     JsonElement o2 = PARSER.parse(o.toString());
>
>     if(o2.equals(o1))
>
>   return 0;
>
>     else
>
>   return -1;
>
> }
>
>
> The problem I have now is that only consecutive duplicates are 
> detected. Given 6 lines:
>
> {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false}
>
> {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}
>
> {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":true}
>
> {"ts":1368758947.291035,"isSecure":false,"version":2,"source":"sdk","debug":false}
>
> {"ts":1368758947.291035, 
> "source":"sdk","isSecure":false,"version":2,"debug":false}
>
> {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false}
>
>
> I get back 1, 3, 4, and 6. I should be getting 1, 3 and 4 as 6 is 
> exactly equal to 1. If I switch 5 and 6, the original line 5 is no 
> longer filtered (I get 1,3,4,5,6). I�ve noticed that the compareTo 
> method is called a total of 13 times. I assume that in order for all 6 
> of the keys to be compared, 15 comparisons need to be made. Am I 
> missing something here? I�ve tested the compareTo manually and line 1 
> and 6 are interpreted as equal. My map reduce code currently looks 
> like this:
>
>
> class DupFilter{
>
>     private static final Gson GSON = new Gson();
>
>     private static final JsonParser PARSER = new JsonParser();
>
>     public static class Map extends MapReduceBase implements 
> Mapper<LongWritable, Text, JSONWrapper, IntWritable> {
>         public void map(LongWritable key, Text value, 
> OutputCollector<JSONWrapper, IntWritable> output, Reporter reporter) 
> throws IOException{
>
>             JsonElement je = PARSER.parse(value.toString());
>
>             JSONWrapper jow = null;
>
>             jow = new JSONWrapper(value.toString());
>
>             IntWritable one = new IntWritable(1);
>
>             output.collect(jow, one);
>
>     }
>
>     }
>
>     public static class Reduce extends MapReduceBase implements 
> Reducer<JSONWrapper, IntWritable, JSONWrapper, IntWritable> {
>
>   public void reduce(JSONWrapper jow, Iterator<IntWritable> values, 
> OutputCollector<JSONWrapper, IntWritable> output, Reporter reporter) 
> throws IOException {
>
>             int sum = 0;
>
>             while (values.hasNext())
>
>                 sum += values.next().get();
>
>             output.collect(jow, new IntWritable(sum));
>
>       }
>
>     }
>
>     public static void main(String[] args) throws Exception {
>
>         JobConf conf = new JobConf(DupFilter.class);
>
>   conf.setJobName("dupfilter");
>
>   conf.setOutputKeyClass(JSONWrapper.class);
>
>   conf.setOutputValueClass(IntWritable.class);
>
>   conf.setMapperClass(Map.class);
>
>   conf.setReducerClass(Reduce.class);
>
>   conf.setInputFormat(TextInputFormat.class);
>
>   conf.setOutputFormat(TextOutputFormat.class);
>
>   FileInputFormat.setInputPaths(conf, new Path(args[0]));
>
>   FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>   JobClient.runJob(conf);
>
>     }
>
> }
>
> Thanks,
>
> Max Lebedev
>
>
>
> On Tue, Jun 4, 2013 at 10:58 PM, Rahul Bhattacharjee 
> <rahul.rec.dgp@gmail.com <mailto:rahul.rec.dgp@gmail.com>> wrote:
>
>     I agree with Shahab , you have to ensure that the key are writable
>     comparable and values are writable in order to be used in MR.
>
>     You can have writable comparable implementation wrapping the
>     actual json object.
>
>     Thanks,
>     Rahul
>
>
>     On Wed, Jun 5, 2013 at 5:09 AM, Mischa Tuffield <mischa@mmt.me.uk
>     <mailto:mischa@mmt.me.uk>> wrote:
>
>         Hello,
>
>         On 4 Jun 2013, at 23:49, Max Lebedev <max.l@actionx.com
>         <mailto:max.l@actionx.com>> wrote:
>
>>         Hi. I've been trying to use JSONObjects to identify
>>         duplicates in JSONStrings.
>>         The duplicate strings contain the same data, but not
>>         necessarily in the same order. For example the following two
>>         lines should be identified as duplicates (and filtered).
>>
>>         {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false
>>         {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}
>>
>>
>         Can you not use the timestamp as a URI and emit them as URIs.
>         Then you have your mapper emit the following kv :
>
>         output.collect(ts, value);
>
>         And you would have a straight forward reducer that can dedup
>         based on the timestamps.
>
>         If above doesn't work for you, I would look at the jackson
>         library for mangling json in java. It method of using java
>         beans for json is clean from a code pov and comes with lots of
>         nice features.
>         http://stackoverflow.com/a/2255893
>
>         P.S. In your code you are using the old'er map reduce API, I
>         would look at using the newer APIs in this
>         package org.apache.hadoop.mapreduce
>
>         Mischa
>>
>>         This is the code:
>>
>>         class DupFilter{
>>
>>                 public static class Map extends MapReduceBase
>>         implements Mapper<LongWritable, Text, JSONObject, Text> {
>>
>>                         public void map(LongWritable key, Text value,
>>         OutputCollector<JSONObject, Text> output, Reporter reporter)
>>         throws IOException{
>>
>>         JSONObject jo = null;
>>
>>                         try {
>>
>>         jo = new JSONObject(value.toString());
>>
>>                                 } catch (JSONException e) {
>>
>>               e.printStackTrace();
>>
>>         }
>>
>>         output.collect(jo, value);
>>
>>                         }
>>
>>                 }
>>
>>                 public static class Reduce extends MapReduceBase
>>         implements Reducer<JSONObject, Text, NullWritable, Text> {
>>
>>                         public void reduce(JSONObject jo,
>>         Iterator<Text> lines, OutputCollector<NullWritable, Text>
>>         output, Reporter reporter) throws IOException {
>>
>>
>>         output.collect(null, lines.next());
>>
>>                         }
>>
>>                 }
>>
>>                 public static void main(String[] args) throws
>>         Exception {
>>
>>                         JobConf conf = new JobConf(DupFilter.class);
>>
>>         conf.setOutputKeyClass(JSONObject.class);
>>
>>         conf.setOutputValueClass(Text.class);
>>
>>         conf.setMapperClass(Map.class);
>>
>>         conf.setReducerClass(Reduce.class);
>>
>>         conf.setInputFormat(TextInputFormat.class);
>>
>>         conf.setOutputFormat(TextOutputFormat.class);
>>
>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>
>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>         JobClient.runJob(conf);
>>
>>                 }
>>
>>         }
>>
>>         I get the following error:
>>
>>         java.lang.ClassCastException: class org.json.JSONObject
>>
>>                 at java.lang.Class.asSubclass(Class.java:3027)
>>
>>                 at
>>         org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
>>
>>
>>                 at
>>         org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:817)
>>
>>
>>                 at
>>         org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)
>>
>>                 at
>>         org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>>
>>                 at
>>         org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>>
>>
>>         It looks like it has something to do with
>>         conf.setOutputKeyClass(). Am I doing something wrong here?
>>
>>
>>         Thanks,
>>
>>         Max Lebedev
>>
>
>         _______________________________
>         Mischa Tuffield PhD
>         http://mmt.me.uk/
>         @mischat
>
>
>
>
>
>
>


--------------030306080303050009080902
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    A side point for Hadoop experts: a comparator is used for sorting in
    the shuffle. If a comparator always returns -1 for unequal objects,
    then sorting will take longer than it should because there will be a
    certain amount of items that are compared more than once.<br>
    <br>
    Is this true? <br>
    <br>
    <div class="moz-cite-prefix">On 06/05/2013 04:10 PM, Max Lebedev
      wrote:<br>
    </div>
    <blockquote
cite="mid:CALcDHcM5wRnF0z8e2fGdxkv6KH0Lq6kz2ynTnTwdvJ5=CTcBwQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <p class="">I�ve taken your advice and made a wrapper class
          which implements WritableComparable. Thank you very much for
          your help. I believe everything is working fine on that front.
          I used google�s gson for the comparison.</p>
        <p class=""><br>
        </p>
        <p class="">public int compareTo(Object o) {</p>
        <p class=""><span class=""></span>� ��JsonElement o1 =
          PARSER.parse(this.json.toString());</p>
        <p class=""><span class=""></span>� ��JsonElement o2 =
          PARSER.parse(o.toString());</p>
        <p class=""><span class=""></span>� ��if(o2.equals(o1))</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��return 0;</p>
        <p class=""><span class=""></span>� ��else</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��return -1;</p>
        <p class="">}</p>
        <p class=""><br>
        </p>
        <p class="">The problem I have now is that only consecutive
          duplicates are detected. Given 6 lines:�</p>
        <p class="">{"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false}</p>
        <p class="">{"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}</p>
        <p class="">{"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":true}</p>
        <p class="">{"ts":1368758947.291035,"isSecure":false,"version":2,"source":"sdk","debug":false}</p>
        <p class="">{"ts":1368758947.291035,
          "source":"sdk","isSecure":false,"version":2,"debug":false}</p>
        <p class="">{"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false}</p>
        <p class=""><br>
        </p>
        <p class="">I get back 1, 3, 4, and 6. I should be getting 1, 3
          and 4 as 6 is exactly equal to 1. If I switch 5 and 6, the
          original line 5 is no longer filtered (I get 1,3,4,5,6). I�ve
          noticed that the compareTo method is called a total of 13
          times. I assume that in order for all 6 of the keys to be
          compared, 15 comparisons need to be made. Am I missing
          something here? I�ve tested the compareTo manually and line 1
          and 6 are interpreted as equal. My map reduce code currently
          looks like this:</p>
        <p class=""><br>
        </p>
        <p class="">class DupFilter{</p>
        <p class=""><span class=""></span>� � private static final Gson
          GSON = new Gson();</p>
        <p class=""><span class=""></span>� � private static final
          JsonParser PARSER = new JsonParser();</p>
        <p class=""><span class=""></span>� � public static class Map
          extends MapReduceBase implements Mapper&lt;LongWritable, Text,
          JSONWrapper, IntWritable&gt; {<br>
          <span class=""></span><span class=""></span>� � � � public
          void map(LongWritable key, Text value,
          OutputCollector&lt;JSONWrapper, IntWritable&gt; output,
          Reporter reporter) throws IOException{</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � � � � JsonElement je =
          PARSER.parse(value.toString());</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � � � � JSONWrapper jow = null;</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � � � ��jow = new
          JSONWrapper(value.toString());</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � � � ��IntWritable one = new
          IntWritable(1);</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � � � ��output.collect(jow, one);</p>
        <p class=""><span class=""></span><span class=""></span>� � � �
          � ��}</p>
        <p class=""><span class=""></span>� � }</p>
        <p class=""><span class=""></span>� ��public static class Reduce
          extends MapReduceBase implements Reducer&lt;JSONWrapper,
          IntWritable, JSONWrapper, IntWritable&gt; {</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��public void reduce(JSONWrapper jow,
          Iterator&lt;IntWritable&gt; values,
          OutputCollector&lt;JSONWrapper, IntWritable&gt; output,
          Reporter reporter) throws IOException {</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � ��� ��int sum = 0;</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � ��� ��while (values.hasNext())</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span><span class=""></span>� � � � � ��� ��sum +=
          values.next().get();</p>
        <p class=""><span class=""></span><span class=""></span><span
            class=""></span>� � � ��� ��output.collect(jow, new
          IntWritable(sum));</p>
        <p class=""><span class=""></span><span class=""></span>� � �
          ��� ��}</p>
        <p class=""><span class=""></span>� ��}</p>
        <p class=""><span class=""></span>� ��public static void
          main(String[] args) throws Exception {</p>
        <p class="">
          <span class=""></span><span class=""></span>� ��� ��JobConf
          conf = new JobConf(DupFilter.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setJobName("dupfilter");</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setOutputKeyClass(JSONWrapper.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setOutputValueClass(IntWritable.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setMapperClass(Map.class);�</p>
        <p class="">
          <span class=""></span><span class=""></span>� ���
          ��conf.setReducerClass(Reduce.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setInputFormat(TextInputFormat.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��conf.setOutputFormat(TextOutputFormat.class);</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��FileInputFormat.setInputPaths(conf, new Path(args[0]));</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��FileOutputFormat.setOutputPath(conf, new Path(args[1]));</p>
        <p class=""><span class=""></span><span class=""></span>� ���
          ��JobClient.runJob(conf);</p>
        <p class=""><span class=""></span>� ��}</p>
        <p class="">}</p>
        <p class="">Thanks,</p>
        <p class="">
        </p>
        <p class="">Max Lebedev</p>
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Tue, Jun 4, 2013 at 10:58 PM,
            Rahul Bhattacharjee <span dir="ltr">&lt;<a
                moz-do-not-send="true"
                href="mailto:rahul.rec.dgp@gmail.com" target="_blank">rahul.rec.dgp@gmail.com</a>&gt;</span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div dir="ltr">
                <div style="font-family:georgia,serif">I agree with
                  Shahab , you have to ensure that the key are writable
                  comparable and values are writable in order to be used
                  in MR.<br>
                  <br>
                </div>
                <div style="font-family:georgia,serif">You can have
                  writable comparable implementation wrapping the actual
                  json object.<br>
                  <br>
                </div>
                <div style="font-family:georgia,serif">
                  Thanks,<br>
                  Rahul<br>
                </div>
              </div>
              <div class="">
                <div class="h5">
                  <div class="gmail_extra"><br>
                    <br>
                    <div class="gmail_quote">On Wed, Jun 5, 2013 at 5:09
                      AM, Mischa Tuffield <span dir="ltr">&lt;<a
                          moz-do-not-send="true"
                          href="mailto:mischa@mmt.me.uk" target="_blank">mischa@mmt.me.uk</a>&gt;</span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                        <div style="word-wrap:break-word">Hello,�
                          <div><br>
                          </div>
                          <div>
                            <div>
                              <div>
                                <div>On 4 Jun 2013, at 23:49, Max
                                  Lebedev &lt;<a moz-do-not-send="true"
                                    href="mailto:max.l@actionx.com"
                                    target="_blank">max.l@actionx.com</a>&gt;
                                  wrote:</div>
                                <br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <p><span>Hi. I've been trying to use
                                        JSONObjects to identify
                                        duplicates in JSONStrings.�<br>
                                      </span>The duplicate strings
                                      contain the same data, but not
                                      necessarily in the same order. For
                                      example the following two lines
                                      should be identified as duplicates
                                      (and filtered).�</p>
                                    <p><span>{"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false<br>
                                      </span>{"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}�</p>
                                  </div>
                                </blockquote>
                              </div>
                              <div>Can you not use the timestamp as a
                                URI and emit them as URIs. Then you have
                                your mapper emit the following kv :�</div>
                              <div><br>
                              </div>
                              <div>output.collect(ts, value);�</div>
                              <div><br>
                              </div>
                              <div>
                                And you would have a straight forward
                                reducer that can dedup based on the
                                timestamps.�</div>
                              <div><br>
                              </div>
                              <div>If above doesn't work for you, I
                                would look at the jackson library for
                                mangling json in java. It method of
                                using java beans for json is clean from
                                a code pov and comes with lots of nice
                                features.�</div>
                              <div><a moz-do-not-send="true"
                                  href="http://stackoverflow.com/a/2255893"
                                  target="_blank">http://stackoverflow.com/a/2255893</a></div>
                              <div><br>
                              </div>
                              <div>P.S. In your code you are using the
                                old'er map reduce API, I would look at
                                using the newer APIs in this
                                package�org.apache.hadoop.mapreduce</div>
                              <div><br>
                              </div>
                              <div>Mischa</div>
                              <div>
                                <div>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <p><span>This is the code:�</span></p>
                                      <p><span>class DupFilter{</span></p>
                                      <p><span>� � � � public static
                                          class Map extends
                                          MapReduceBase implements
                                          Mapper&lt;LongWritable, Text,
                                          JSONObject, Text&gt; {<span> </span></span></p>
                                      <p><span>� � � � � � � � public
                                          void map(LongWritable key,
                                          Text value,
                                          OutputCollector&lt;JSONObject,
                                          Text&gt; output, Reporter
                                          reporter) throws IOException{�</span></p>
                                      <p><span>� � � � � � � �
                                          JSONObject jo = null;�</span></p>
                                      <p><span>� � � � � � � � try {�</span></p>
                                      <p><span>� � � � � � � � � � � �
                                          jo = new
                                          JSONObject(value.toString());�</span></p>
                                      <p><span>� � � � � � � � � � � � }
                                          catch (JSONException e) {�</span></p>
                                      <p><span>� � � � � � � � � � � � �
                                          � � � e.printStackTrace();�</span></p>
                                      <p><span>� � � � � � � � � � � �
                                          }�</span></p>
                                      <p><span>� � � � � � � �
                                          output.collect(jo, value);�</span></p>
                                      <p><span>� � � � � � � � }�</span></p>
                                      <p><span>� � � � }�</span></p>
                                      <p><span>� � � � public static
                                          class Reduce extends
                                          MapReduceBase implements
                                          Reducer&lt;JSONObject, Text,
                                          NullWritable, Text&gt; {�</span></p>
                                      <p><span>� � � � � � � � public
                                          void reduce(JSONObject jo,
                                          Iterator&lt;Text&gt; lines,
                                          OutputCollector&lt;NullWritable,
                                          Text&gt; output, Reporter
                                          reporter) throws IOException
                                          {�</span></p>
                                      <div><br>
                                      </div>
                                      <p><span>� � � � � � � � � � � �
                                          output.collect(null,
                                          lines.next());�</span></p>
                                      <p><span>� � � � � � � � }�</span></p>
                                      <p><span>� � � � }�</span></p>
                                      <p><span>� � � � public static
                                          void main(String[] args)
                                          throws Exception {�</span></p>
                                      <p><span>� � � � � � � � JobConf
                                          conf = new
                                          JobConf(DupFilter.class);�</span></p>
                                      <p><span>� � � � � � � �
                                          conf.setOutputKeyClass(JSONObject.class);�</span></p>
                                      <p><span>� � � � � � � �
                                          conf.setOutputValueClass(Text.class);�</span></p>
                                      <p><span>� � � � � � � �
                                          conf.setMapperClass(Map.class);�</span></p>
                                      <p>
                                        <span>� � � � � � � �
                                          conf.setReducerClass(Reduce.class);�</span></p>
                                      <p><span>� � � � � � � �
                                          conf.setInputFormat(TextInputFormat.class);�</span></p>
                                      <p><span>� � � � � � � �
                                          conf.setOutputFormat(TextOutputFormat.class);<span>
                                          </span></span></p>
                                      <p><span>� � � � � � � �
                                          FileInputFormat.setInputPaths(conf,
                                          new Path(args[0]));�</span></p>
                                      <p><span>� � � � � � � �
                                          FileOutputFormat.setOutputPath(conf,
                                          new Path(args[1]));<span> </span></span></p>
                                      <p><span>� � � � � � � �
                                          JobClient.runJob(conf);�</span></p>
                                      <p><span>� � � � }�</span></p>
                                      <p><span>}�</span></p>
                                      <p><span>I get the following
                                          error:<br>
                                          �</span></p>
                                      <p><span>java.lang.ClassCastException:
                                          class org.json.JSONObject�</span></p>
                                      <p><span>� � � � at
                                          java.lang.Class.asSubclass(Class.java:3027)�</span></p>
                                      <p><span>� � � � at
org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)�</span></p>
                                      <p><span>� � � � at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.&lt;init&gt;(MapTask.java:817)�</span></p>
                                      <p><span>� � � � at
                                          org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)�</span></p>
                                      <p><span>� � � � at
                                          org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)�</span></p>
                                      <p><span>� � � � at
                                          org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)<br>
                                          �</span></p>
                                      <p><span></span><br>
                                      </p>
                                      <p><span>It looks like it has
                                          something to do with
                                          conf.setOutputKeyClass(). Am I
                                          doing something wrong here?�</span></p>
                                      <p><span></span><br>
                                      </p>
                                      <p><span>Thanks,�</span></p>
                                      <p><span>Max Lebedev</span></p>
                                    </div>
                                  </blockquote>
                                </div>
                              </div>
                            </div>
                            <br>
                            <div>
                              <div
style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-wrap:break-word;word-spacing:0px">
                                <div
style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-wrap:break-word;word-spacing:0px">
                                  <div
style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-wrap:break-word;word-spacing:0px">
                                    <div>_______________________________</div>
                                    <div>Mischa Tuffield PhD</div>
                                    <div><a moz-do-not-send="true"
                                        href="http://mmt.me.uk/"
                                        target="_blank">http://mmt.me.uk/</a></div>
                                    <div>@mischat</div>
                                  </div>
                                  <br>
                                </div>
                                <br>
                              </div>
                              <br>
                              <br>
                            </div>
                            <br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------030306080303050009080902--