cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.
Date Wed, 18 Apr 2012 18:38:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256793#comment-13256793
] 

Jonathan Ellis edited comment on CASSANDRA-3762 at 4/18/12 6:38 PM:
--------------------------------------------------------------------

bq. With this patch we trade whole sequential primary_index read for random I/O with SSTableReader.getPosition()
only for amount saved keys.

I thought Vijay said this sorts the cache first.  In which case we're really doing seq i/o,
we're just skipping parts that don't have any keys.  Right?

bq. If we want to see the optimal solution for all the use cases i think we have to go for
the alternative where we can save the Keycache position to the disk and read it back and what
ever is missing let it fault fill.

I like this idea.  If you have a lot of rows (i.e., a large index) then this is the only thing
that's going to save you from doing a lot of i/o.  Even with seq i/o, reading a small cache
will be much faster than scanning a large index.

The only downside I see is the question of how much churn your sstables will experience between
save, and load.  If you have a small data set that is constantly being overwritten for instance,
you could basically invalidate the whole cache.  But, it's quite possible that just reducing
cache save period is adequate to address this.  So I think we should give this a try.
                
      was (Author: jbellis):
    bq. With this patch we trade whole sequential primary_index read for random I/O with SSTableReader.getPosition()
only for amount saved keys.

I thought Vijay said this sorts the cache first.  In which case we're really doing seq i/o,
we're just skipping parts that don't have any keys.  Right?

bq. If we want to see the optimal solution for all the use cases i think we have to go for
the alternative where we can save the Keycache position to the disk and read it back and what
ever is missing let it fault fill.

I like this idea.  If you have a lot of rows (i.e., a large index) then this is the only thing
that's going to save you from doing a lot of i/o.  Even with seq i/o, reading a small cache
will be much faster than scanning a large index.

The only downside I see is the question of how much churn your sstables will experience between
save, and load.  If you have a small data set that is constantly being overwritten for instance,
you could basically invalidate the whole cache.  But, it's quite possible that just reducing
cache save period is adequate to address this.
                  
> AutoSaving KeyCache and System load time improvements.
> ------------------------------------------------------
>
>                 Key: CASSANDRA-3762
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-SavedKeyCache-load-time-improvements.patch
>
>
> CASSANDRA-2392 saves the index summary to the disk... but when we have saved cache we
will still scan through the index to get the data out.
> We might be able to separate this from SSTR.load and let it load the index summary, once
all the SST's are loaded we might be able to check the bloomfilter and do a random IO on fewer
Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message