pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: IOException appearing during dump but not illustrate
Date Wed, 08 Dec 2010 22:04:07 GMT
Try explicitly casting argMap#'s' to a chararray?


On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward <kris@melon.org> wrote:

> Hi,
>
> I've recently gotten stumped by a problem where my attempts to dump the
> relations produced by a GROUP command give the following error (though
> illustrating the same relation works fine):
>
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.pig.impl.io.NullableBytesWritable, recieved
> org.apache.pig.impl.io.NullableText
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
>        at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> .
> .
> .
>
> for a little background, the relation that's failing is called y5, and
> is produced by the following string of commands (in grunt):
>
> y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as
> argMap;
> y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
> y4 = FILTER y3 BY (uid is not null);
> y5 = GROUP y4 BY uid;
>
> and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:
>
>
> -----------------------------------------------------------------------------------------------------
> | y1     | timestamp: int | args: bag({tuple_of_tokens: (token:
> chararray)})                        |
>
> -----------------------------------------------------------------------------------------------------
> |        | 1265950806     | {(s=1381688313),
> (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
>
> -----------------------------------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------------------------
> | y2     | timestamp: int | argMap: map
>                   |
>
> -----------------------------------------------------------------------------------------------
> |        | 1265950806     | {u=F68FFA1F655FDF494ABA520D95E1D99E,
> ts=1265950805, s=1381688313} |
>
> -----------------------------------------------------------------------------------------------
> --------------------------------------------
> | y3     | uid: bytearray | timestamp: int |
> --------------------------------------------
> |        | 1381688313     | 1265950806     |
> --------------------------------------------
> --------------------------------------------
> | y4     | uid: bytearray | timestamp: int |
> --------------------------------------------
> |        | 1381688313     | 1265950806     |
> --------------------------------------------
>
> The same problem was also produced when the FILTER command was omitted,
> and the relevant chunk of code in myudfs.httpArgParse is:
>
>    StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
>    if (tok.hasMoreTokens() ) {
>    String oKey = tok.nextToken();
>        if (tok.hasMoreTokens() ) {
>            Object oValue = tok.nextToken();
>            output.put(oKey, oValue);
>        } else {
>            output.put(oKey, null);
>        }
>    }
>
> If anyone has any insight how I could get this to work, that'd really
> help me out.
>
> Thanks,
> Kris
>
> P.S. For those who remember my earlier post about getting httpArgParse
> to compile, I took the advice to ditch the InternalMap in favour of a
> HashMap<String,Object>
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message