hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikram Oberoi (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1068) COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple'
Date Fri, 30 Oct 2009 20:47:59 GMT
COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText,
recieved org.apache.pig.impl.io.NullableTuple'
-----------------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-1068
                 URL: https://issues.apache.org/jira/browse/PIG-1068
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.4.0
            Reporter: Vikram Oberoi


The COGROUP in the following script fails in its map:

{code}
logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, command:chararray, comments:chararray);
                                                                                         
            
                                                                                         
                                                                                         
                           
SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit';            
                                                                                         
                           
                                                                                         
                                                                                         
                           
-- Project login clients and count them by ID.                                           
                                                                                         
                           
login_info = FOREACH logins {                                                            
                                                                                         
                           
    GENERATE id as id,                                                                   
                                                                                         
                           
    comments AS client;                                                                  
                                                                                         
                           
};                                                                                       
                                                                                         
                           
                                                                                         
                                                                                         
                           
logins_grouped = GROUP login_info BY (id, client);                                       
                                                                                         
                           
                                                                                         
                                                                                         
                           
count_logins_by_client = FOREACH logins_grouped {                                        
                                                                                         
                           
    generate group.id AS id, group.client AS client, COUNT($1) AS count;                 
                                                                                         
                           
}                                                                                        
                                                                                         
                           
                                                                                         
                                                                                         
                           
-- Get the first quit.                                                                   
                                                                                         
                           
all_quits_grouped = GROUP all_quits BY id;                                               
                                                                                         
                           
                                                                                         
                                                                                         
                           
quits = FOREACH all_quits_grouped {                                                      
                                                                                         
                           
    ordered = ORDER all_quits BY ts ASC;                                                 
                                                                                         
                           
    last_quit = LIMIT ordered 1;                                                         
                                                                                         
                           
    GENERATE FLATTEN(last_quit);                                                         
                                                                                         
                           
}                                                                                        
                                                                                         
                           
                                                                                         
                                                                                         
                           
-- Now, group all the info together.                                                     
                                                                                         
                           
joined_session_info = COGROUP quits BY id, count_logins_by_client BY id;                 
                                                                                         
                           
                                                                                         
                                                                                         
                           
DUMP joined_session_info;
{code}

Here's the stack trace:

{code}
java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText,
recieved org.apache.pig.impl.io.NullableTuple
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:229)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message