hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject OutputValueGroupingComparator gets strange inputs (topic changed from "Logs cannot be created")
Date Thu, 09 Aug 2012 14:47:59 GMT
Hi again,

this is an direct response to my previous posting with the title "Logs 
cannot be created", where logs could not be created (Spill failed). I 
got the hint, that i gotta check privileges, but that was not the 
problem, because i own the folders that were used for this.

I finally found an important hint in a log saying:
12/08/09 15:30:49 WARN mapred.JobClient: Error reading task 
outputhttp://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stdout
12/08/09 15:30:49 WARN mapred.JobClient: Error reading task 
outputhttp://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stderr
12/08/09 15:34:34 INFO mapred.JobClient: Task Id : 
attempt_201208091516_0001_m_000055_0, Status : FAILED
java.io.IOException: Spill failed
         at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
         at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
         at uni.kassel.macek.rtprep.RetweetMapper.map(RetweetMapper.java:26)
         at uni.kassel.macek.rtprep.RetweetMapper.map(RetweetMapper.java:12)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
         at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: ""
         at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
         at java.lang.Integer.parseInt(Integer.java:468)
         at java.lang.Integer.parseInt(Integer.java:497)
         at uni.kassel.macek.rtprep.Tweet.getRT(Tweet.java:126)
         at 
uni.kassel.macek.rtprep.TwitterValueGroupingComparator.compare(TwitterValueGroupingComparator.java:47)
         at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111)
         at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
         at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
         at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
         at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
         at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)



corresponding to the following lines of code within the class 
TwitterValueGroupingComparator:

public class TwitterValueGroupingComparator implements RawComparator<Text> {
...
     public int compare(byte[] text1, int start1, int length1, byte[] text2,
         int start2, int length2) {

     byte[] tweet1 = new byte[length1];// length1-1 (???)
     byte[] tweet2 = new byte[length2];// length1-1 (???)

     System.arraycopy(text1, start1, tweet1, 0, length1);// start1+1 (???)
     System.arraycopy(text2, start2, tweet2, 0, length2);// start2+1 (???)

     Tweet atweet1 = new Tweet(new String(tweet1));
     Tweet atweet2 = new Tweet(new String(tweet2));


     String key1 = atweet1.getAuthor();
     String key2 = atweet2.getAuthor();
////////////////////////////////////////////////////////////////
//THE FOLLOWING LINE IS THE ONE MENTIONED IN THE LOG (47)
/////////////////////////////////////////////////////////////////
     if (atweet1.getRT() > 0 && !atweet1.getMention().equals(""))
         key1 = atweet1.getMention();
     if (atweet2.getRT() > 0 && !atweet2.getMention().equals(""))
         key2 = atweet2.getMention();

     int realKeyCompare = key1.compareTo(key2);
     return realKeyCompare;
     }

}

As i am taking the incoming bytes and interpret them as Tweets by 
recreating the appropriate CSV-Strings and Tokenizing it, i was kind of 
sure, that the problem somehow are the leading bytes, that Hadoop puts 
in front of the data being compared. Since i never really understood 
what hadoop is doing to the strings when they are sent to the 
KeyComparator  i simply appended all strings to a file in order to see 
myself.

You can see the results here:

??2009-06-12 07:32:47, davedilbeck, tampabaycom, , , 
http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but it's 
mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
??2009-06-12 07:32:47, davedilbeck, tampabaycom, , , 
http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but it's 
mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
I2009-06-12 04:33:19, ntmp, tsukunep, , , , 1, 0, ??????????????????, null
??2009-06-12 07:32:47, davedilbeck, tampabaycom, , , 
http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but it's 
mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
^2009-06-12 04:33:20, aclouatre, , , , , 0, 0, Bored out of my mind 
Watching food network, null
b2009-06-12 04:33:20, djnewera, adoremii369, , , , 1, 0, LOL WORDUP ANT 
NOTHING LIKE THE HOOD, null


As you can see there are different leading characters: sometimes its 
"??", other times its "b" or "^", etc.

My question is now:
How many bits do i have to cut off, so i get the original Text as a 
String that i put into the key-position of my mapper output? What are 
the concepts behind this?

Thanks for your help in advance!

Best regards,
Elmar Macek




Mime
View raw message