cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Harris (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-4023) Batch reading BloomFilters on startup
Date Fri, 09 Mar 2012 03:38:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225777#comment-13225777
] 

Michael Harris edited comment on CASSANDRA-4023 at 3/9/12 3:37 AM:
-------------------------------------------------------------------

My $0.02 is that it may be helpful to batch reads.  Not sure if the underlying stream used
in reading the bloom filters reads a large chunk and caches it, but if not, it could help
to instead of just calling ois.readLong(), you read 64K or 1M or whatever you feel is appropriate
(maybe configurable?) into a buffer and grab the longs out of those.  This doesn't completely
fix the problem of disk contention, but it might cause larger sequential reads to be submitted
to the disk, which then might behave nicer?

The specific example I'm thinking of here is: it looks like the deserialization of LegacyBloomFilter
(perhaps what 0.8 uses?) is just a ois.readObject() for a BitSet.  And that's like, it.  Whereas
for BloomFilter (the new version?), deserialization is a tight loop of readLong() calls. 
Same with serialization FWIW.  Not that using Java serialization for LTS is necessarily a
good idea, but it may be happier for the disk.
                
      was (Author: mharris):
    My $0.02 is that it may be helpful to batch reads.  Not sure if the underlying stream
used in reading the bloom filters reads a large chunk and caches it, but if not, it could
help to instead of just calling ois.readLong(), you read 64K or 1M or whatever you feel is
appropriate (maybe configurable?) into a buffer and grab the longs out of those.  This doesn't
completely fix the problem of disk contention, but it might cause larger sequential reads
to be submitted to the disk, which then might behave nicer?
                  
> Batch reading BloomFilters on startup
> -------------------------------------
>
>                 Key: CASSANDRA-4023
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4023
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Joaquin Casares
>              Labels: datastax_qa
>
> The difference of startup times between a 0.8.7 cluster and 1.0.7 cluster with the same
amount of data is 4x greater in 1.0.7.
> It seems as though 1.0.7 loads the BloomFilter through a series of reading longs out
in a multithreaded process while 0.8.7 reads the entire object.
> Perhaps we should update the new BloomFilter to do reading in batch as well?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message