cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From marlon hendred <>
Subject Re: Running hadoop jobs over compressed column familes with datastatx
Date Tue, 29 Apr 2014 19:29:12 GMT
I was able to solve the issue. There was another layer of compression
happening in the DAO that was using, along
with the snappy compression defined on the CF. The solution was to extend
CassandraStorage and override the getNext() method. The new implementation
calls super.getNext() and inflates the Tuples where appropriate.


On Wed, Apr 23, 2014 at 1:39 PM, marlon hendred <> wrote:

> Hi,
> I'm attempting to dump a pig relation of a compressed column family. Its a
> single column whose value is a json blob. It's compressed via snappy
> compression and the value validator is BytesType. After I create the
> relation and dump I get garbage. Here is the describe:
> ColumnFamily: CF
>       Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType
>       Default column value validator:
> org.apache.cassandra.db.marshal.BytesType
>       Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       GC grace seconds: 86400
>       Compaction min/max thresholds: 2/32
>       Read repair chance: 0.1
>       DC Local Read repair chance: 0.0
>       Populate IO Cache on flush: false
>       Replicate on write: true
>       Caching: KEYS_ONLY
>       Bloom Filter FP chance: default
>       Built indexes: []
>       Compaction Strategy:
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>       Compression Options:
>         sstable_compression:
> Pig stuff:
> rows = LOAD 'cql://Keyspace/CF' using CqlStorage();
> I've tried to overwrite the schema by adding 'as (key: chararray, col1:
> chararray, value: chararray)' but when I dump this it still looks like its
> binary.
> Do I need to implement my own CqlStorage() here that uncompress or am I
> just missing something? I've done some googling but haven't seen anything
> on the subject.  Also I am using Datastax Enterprise. 3.1. Thanks in
> advance!
> -m

View raw message