incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Wagner <andreas.josef.wag...@googlemail.com>
Subject Best practice to store large keys (>= 64 KByte)
Date Sun, 06 Jul 2014 10:01:05 GMT
Hi Cassandra users,

I'm wondering if there are any best practices to use keys (>= 64 KByte). 
I'm aware that there is a Cassandra restriction for this [1]. However, 
my application requires that some keys may be >= 64 KByte. I'm currently 
trying a simple hash-table solution:

//key BLOB may be >= 64 KByte
CREATE TABLE hashedKey VARINT, key BLOB, value BLOB, PRIMARY KEY (hashedKey)

That is, only hash values of keys are indexed.  If I a need to search 
for a key, I do:

V search (K key)  {

//compute hash for key
int hashedKey = computeHash(key)
//retrieve key with this hash from Cassandra
K key_with_same_hash = getKeyWithHash(hashedKey)

while (key_with_same_hash != key) {

//compute next hash
hashedKey = resolveHashCollision(key)
//retrieve key with this new hash
key_with_same_hash = getKeyWithHash(hashedKey)

}

//found correct hash value for key, now retrieve value for this key
return getValueWithHash(hashedKey)

I'm aware that I could also do other hash collision resolutions. Most 
notably some that uses maps as an additional data structure:

//the key2value map holds all keys with this hashedKey
CREATE TABLE hashedKey VARINT, map<BLOB, BLOB> key2value, PRIMARY KEY 
(hashedKey)

However, as far as I understand, Cassandra and CQL would completely 
matrialize the key2value map for each lookup with the
hashedKey. This is not so cool ...

I was also considering splitting up the key in 64 KByte fragments and 
storing them in a tree, e.g., a binary search tree or a trie.

Does anyone have experience with this kind of problem?

Thanks for your help
Andreas

[1] http://wiki.apache.org/cassandra/FAQ#max_key_size

Mime
View raw message