cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvère Lestang (JIRA) <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name
Date Wed, 22 Jun 2011 14:50:53 GMT
RuntimeException in Pig when using "dump" command on column name
----------------------------------------------------------------

                 Key: CASSANDRA-2810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.1
         Environment: Ubuntu 10.10, 32 bits
java version "1.6.0_24"
Brisk beta-2 installed from Debian packages
            Reporter: Silvère Lestang


This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].

In cassandra-cli:
{code}
[default@unknown] create keyspace Test
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = [{replication_factor:1}];

[default@unknown] use Test;
Authenticated to keyspace: Test

[default@Test] create column family test;

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=0000000000000001, value=35, timestamp=1308744931122000)
=> (column=0000000000000002, value=36, timestamp=1308744931124000)
=> (column=0000000000000003, value=38, timestamp=1308744931125000)
-------------------
RowKey: 726f7732
=> (column=0000000000000001, value=45, timestamp=1308744931127000)
=> (column=0000000000000002, value=42, timestamp=1308744931128000)
=> (column=0000000000000003, value=33, timestamp=1308744932722000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
In Pig command line:
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray,
columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
In /var/log/cassandra/system.log, I have severals time this exception:
{code}
INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551)
Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data
type -1 found in stream.
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
{code}
and the request failed.

{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray,
columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.value;

grunt> dump value_test;
{code}

This time, without the column name, it's work (but the value are displayed as char instead
of integer). Result:
{code}
(row1,{(#),($),(&)})
(row2,{(-),(*),(!)})
{code}

Now we do the same test but we set a comparator to the CF.
{code}
[default@Test] create column family test with comparator = 'LongType';

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=1, value=35, timestamp=1308748643506000)
=> (column=2, value=36, timestamp=1308748643508000)
=> (column=3, value=38, timestamp=1308748643509000)
-------------------
RowKey: 726f7732
=> (column=1, value=45, timestamp=1308748643510000)
=> (column=2, value=42, timestamp=1308748643512000)
=> (column=3, value=33, timestamp=1308748645138000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.LongType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray,
columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
This time it's work as expected (appart from the value displayed as char). Result:
{code}
(row1,{(1),(2),(3)},{(#),($),(&)})
(row2,{(1),(2),(3)},{(-),(*),(!)})
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message