cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrik Modesto <>
Subject Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse
Date Tue, 25 Jan 2011 13:16:11 GMT
Hi Mick,

attached is the very simple MR job, that deletes expired URL from my
test Cassandra DB. The keyspace looks like this:

Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 2
  Column Families:
    ColumnFamily: Url2
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 0.0/0
      Key cache size / save period: 200000.0/3600
      Memtable thresholds: 4.7015625/1003/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []

In the CF the key is URL and inside there are some data. My MR job
needs just "expire_date" which is int64 timestamp. For now I store it
as a string because I use Python and C++ to manipulate the data as

For the MR Job to run you need a patch I did. You can find it here:

The atttached file contains the working version with cloned key in
reduce() method. My other aproache was:
context.write(ByteBuffer.wrap(key.getBytes(), 0, key.getLength()),
Which produce junk keys.

Best regards,

View raw message