cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-3552) Pig data objects returned by CassandraStorage behave irrationally.
Date Fri, 25 Jan 2013 23:07:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brandon Williams resolved CASSANDRA-3552.
-----------------------------------------

    Resolution: Cannot Reproduce

Please try the latest version and see if the issue continues.
                
> Pig data objects returned by CassandraStorage behave irrationally.
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-3552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3552
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Contrib
>    Affects Versions: 1.0.3
>         Environment: Ubuntu
>            Reporter: Chris Howe
>
> When I try to perform computations on data that I get back from CassandraStorage in Pig,
I see inexplicable results.
> For example, on a column family that has UTF8Type as the key validator, I do the following:
> A = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage();
> B = FOREEACH A GENERATE (chararray) key;
> STORE B INTO 'tempfile';
> C = LOAD 'tempfile' AS (key:chararray);
> D1 = FOREACH B GENERATE SUBSTRING(key,0,10);
> D2 = FOREACH C GENERATE SUBSTRING(key,0,10);
> DUMP D1;
> DUMP D2;
> For D1 I get
> ()
> ()
> ()
> ()
> ()
> For D2 I get:
> (a)
> (b x y)
> (b)
> (a b c)
> (a c b)
> Clearly something has gone awry!
> I have tried many workarounds and other functions. TOKENIZE has an entirely different
behavior:
> E = FOREACH B GENERATE TOKENIZE(key)
> Ultimately this throws an exception:
> 2011-12-01 15:01:56,007 [Thread-149] WARN  org.apache.hadoop.mapred.LocalJobRunner -
job_local_0010
> org.apache.pig.backend.executionengine.ExecException: ERROR 2114: Expected input to be
chararray, but got org.apache.pig.data.DataByteArray
> 	at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:62)
> 	at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:43)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:338)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message