cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Schrag (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4421) Support cql3 table definitions in Hadoop InputFormat
Date Tue, 07 May 2013 15:23:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650957#comment-13650957
] 

Mike Schrag commented on CASSANDRA-4421:
----------------------------------------

This is a stock 1.2.4 cassandra install. For total disclosure, I DID suck your code out of
cass and into our project, and made a few tweaks to build against newer hadoop libs, but I'm
actually not even using hadoop here -- i'm just calling the ColumnFamilyInputFormat from a
simple java main method, so it's possible i'm skewing something with a busted test, but i
don't THINK so:

{code}
    Configuration conf = new Configuration();

    ConfigHelper.setInputInitialAddress(conf, "127.0.0.1");
    ConfigHelper.setInputRpcPort(conf, "9160");
    ConfigHelper.setInputPartitioner(conf, "Murmur3Partitioner");
    ConfigHelper.setInputColumnFamily(conf, "whatever", "branch");
    CQLConfigHelper.setInputCQLPageRowSize(conf, "3");
    //CQLConfigHelper.setInputWhereClauses(conf, "title='A'");

    JobContext jobContext = new JobContextImpl(conf, new JobID());
    TaskAttemptContext context = new TaskAttemptContextImpl(conf, new TaskAttemptID());
    
    ColumnFamilyInputFormat inputFormat = new ColumnFamilyInputFormat();
    List<InputSplit> splits = inputFormat.getSplits(jobContext);
    for (InputSplit split : splits) {
      ColumnFamilySplit columnFamilySplit = (ColumnFamilySplit) split;
      System.out.printf("split: %s\n", split);

      ColumnFamilyRecordReader reader = new ColumnFamilyRecordReader();
      reader.initialize(split, context);
      // now read out all the values...
      while (reader.nextKeyValue()) {

        List<IColumn> keys = reader.getCurrentKey();
        System.out.println("CassandraBulkTest.main: " + ByteBufferUtil.string(keys.get(0).value()));
        Map<ByteBuffer, IColumn> columns = reader.getCurrentValue();
        for (IColumn column : columns.values()) {
          String name  = ByteBufferUtil.string(column.name());
          String value = "skipped";//column.value() != null ? ByteBufferUtil.string(column.value())
: "null value";
          System.out.println("CassandraBulkTest.main: " + name + "=>" + value);
        }
      }
    }
  }
{code}
                
> Support cql3 table definitions in Hadoop InputFormat
> ----------------------------------------------------
>
>                 Key: CASSANDRA-4421
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4421
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>    Affects Versions: 1.1.0
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>              Labels: cql3
>             Fix For: 1.2.5
>
>         Attachments: 4421.txt
>
>
> Hello,
> i faced a bug while writing composite column values and following validation on server
side.
> This is the setup for reproduction:
> 1. create a keyspace
> create keyspace test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor
= 1;
> 2. create a cf via cql (3.0)
> create table test1 (
>     a int,
>     b int,
>     c int,
>     primary key (a, b)
> );
> If i have a look at the schema in cli i noticed that there is no column metadata for
columns not part of primary key.
> create column family test1
>   with column_type = 'Standard'
>   and comparator = 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type)'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'Int32Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> Please notice the default validation class: UTF8Type
> Now i would like to insert value > 127 via cassandra client (no cql, part of mr-jobs).
Have a look at the attachement.
> Batch mutate fails:
> InvalidRequestException(why:(String didn't validate.) [test][test1][1:c] failed validation)
> A validator for column value is fetched in ThriftValidation::validateColumnData which
returns always the default validator which is UTF8Type as described above (The ColumnDefinition
for given column name "c" is always null)
> In UTF8Type there is a check for
> if (b > 127)
>    return false;
> Anyway, maybe i'm doing something wrong, but i used cql 3.0 for table creation. I assigned
data types to all columns, but i can not set values for a composite column because the default
validation class is used.
> I think the schema should know the correct validator even for composite columns. The
usage of the default validation class does not make sense.
> Best Regards 
> Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message