cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgeny Ryabitskiy (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-3545) Fix very low Index Search performance
Date Wed, 30 Nov 2011 22:41:40 GMT
Fix very low Index Search performance
-------------------------------------

                 Key: CASSANDRA-3545
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3545
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 1.0.5, 1.0.4
            Reporter: Evgeny Ryabitskiy
            Priority: Critical
             Fix For: 1.0.6


While performing index search + value filtering over large Index Row ( ~100k keys per index
value) with chunks (size of 512-1024 keys) search time is about 8-12 seconds, which is very
very low.

After profiling I got this picture:

60% of search time is calculating MD5 hash with MessageDigester (Of cause it is because of
RundomPartitioner).
33% of search time (half of all MD5 hash calculating time) is double calculating of MD5 for
comparing two row keys while rotating Index row to startKey (when performing search query
for next chunk).

I see several performance improvements:

1) Use good algorithm to search startKey in sorted collection, that is faster then iteration
over all keys. This solution is on first place because it simple, need only local code changes
and should solve problem (increase search in multiple times).

2) Don't calculate MD5 hash for startKey every time. It's optimal to compute it once (so search
will be twice faster).
Also need local code changes.

3) Think about something faster that MD5 for hashing (like TigerRandomPartitioner with Tiger/128
hash).
Need research and maybe this research was done.

4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and sorting keys in
index rows. In index rows, keys can be stored and compared with simple Byte Comparator. 
This solution requires huge code changes.

I'm going to start from first solution. Next improvements can be done with next tickets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message