incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject new to github: Casbase: distributed secondary indexes for Cassandra
Date Tue, 06 Sep 2011 15:16:46 GMT
https://github.com/edwardcapriolo/casbase

What is it?

There are many great articles about building secondary Cassandra indexes
such as http://www.anuff.com/2011/02/indexing-in-cassandra.html. In a
nutshell, index building boils down to turning a single insert into multiple
inserts to support different types of searches. Casbase attempts to make
this 'friendly' and reusable. It is made friendly by allowing the user to
define Tables and Indexes, then when the insert method is called, Casbase
takes care of updating all the indexes.

String tablename="ncars";
Table t = new Table();
t.name = tablename;
t.columns.add(new Col("vin".getBytes(),Col.ColType.LONG,false));
Index i = new Index();
i.columns.add("vin".getBytes());
i.it= Index.IndexType.ORDERED_BUCKETS;
i.indexOptions="3";
i.name = "vinidx";
t.key = new Col("key".getBytes(),Col.ColType.BYTES,false);

db.create(t);

for (int k=0;k<7;k++){
   Map<byte [],byte[]> cols = new HashMap<byte[],byte[]>();
   cols.put("make".getBytes(),"honda".getBytes());
   cols.put("model".getBytes(),"civic".getBytes());
   cols.put("vin".getBytes(),CasBaseUtil.longToBytes(k));
   db.insert(tablename, ("car"+k).getBytes(), cols);
}

Casbase is related to/a hybrid of:

https://github.com/edanuff/CassandraIndexedCollections
https://github.com/riptano/Cassandra-EZ-Client
https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29

There are currently two secondary index implementations HASH and
ORDERED_BUCKETS. The ORDERED_BUCKETS implementation uses composite columns
and sharding to allow !distributed ranged queries! on a index (ie something
like. 'where column > 5 and column < 7' ).
Dragons: Yes, distributed secondary via ORDERED_BUCKETS involves get_slice
on N buckets on read path (you can also multi_getslice as well). Yes,
distributed indexes are not "fast" like local indexes are, but they are what
they are.

Status:
Code is still in an academic phase. It started in the last week and as
evidenced by my 50 commits this holiday it is not stable either. Have fun.
Stay tuned.

Mime
View raw message