crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Exception from Set.difference
Date Thu, 02 Apr 2015 21:42:46 GMT
Yeah, it looks like Avro doesn't support comparison on map fields:

https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java

Assuming the value of the map fields matter for comparison purposes, it
seems like your best bet is to serialize the data as a List of pairs or two
Lists with corresponding entries, ensuring that the lists are sorted based
on the key of the map. Not a pretty solution, but it should work.

J

On Thu, Apr 2, 2015 at 2:00 PM, Lucy Chen <lucychen2014fall@gmail.com>
wrote:

> Hi,
>
>      I am trying to do Set difference as follows:
>
> PCollection<MyClass> C = Set.difference(A, B);
>
>
> Here both A and B are PCollection<MyClass> type.
>
>
> MyClass is defined as follows:
>
>
> public class *MyClass* implements java.io.Serializable, Cloneable{
>
>  private String a;
>
> private String b;
>
> private int c;
>
> private Map<String, Double> d;
>
> private int e;
>
>  public MyClass(){
>
> this(null, null, 0, new HashMap<String, Double>());
>
> }
>
>  public MyClass(String labelID, String sampleID, Integer pos_neg_ind,
> HashMap<String, Double> feat_val_pair){
>
> ......
>
>         }
>
>         public MyClass(String input){
>
>          .....
>
>          }
>
>          .....
>
> }
>
>
>       From running the set difference, I got the following error. Was that
> because of MyClass including a Map member d? If so, is there another way to
> generate the set diff by having these inputs?
>
>
>       Thanks!
>
>
> Lucy
>
>
> java.lang.Exception:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while
> doing final merge
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>
> Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> Error while doing final merge
>
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)
>
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:744)
>
> Caused by: org.apache.avro.AvroRuntimeException: Can't compare maps!
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:134)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:139)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:92)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:72)
>
> at
> org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:43)
>
> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:578)
>
> at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
>
> at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)
>
> at
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:524)
>
> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:539)
>
> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)
>
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:731)
>
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)
>
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)
>
> ... 7 more
>
>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message