incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject random thoughts for MUCH faster key lookup in cassandra
Date Wed, 29 May 2013 16:49:16 GMT
We recently ran into too much data in one CF because LCS can't really run in parallel on one
CF in a single tier which got me thinking, why doesn't the CF directoy have 100 or 1000 directories
0-999 and cassandra hash the key to which directory it would go in and then put it in one
of the sstables in that directory.  This would lead to

 1.  Parallel compaction of LCS in a single CF !!!!  Yeah, faster compactions since there
is less to sort in each directory(and it can be done in parallel too)
 2.  Help with fast key lookups as it hashes to one of the 1000 directories very quickly and
then just needs to find the key in one of the sstables which are sorted (there would be 1000x
less sstables in each directory than in one big CF)

Am I on crack here? Or does that seem like it would be a pretty good direction to go?

Maybe this is only because our system has 98% of it's data in one CF while other systems have
10% of their data in each CF though.  I still tend to think a lot of people will end up with
80% of their data in one CF and 20% in all the other CF's…isn't pareto's principal a natural
tendency and if it is, maybe the above feature should be considered?

Later,
Dean

Mime
View raw message