cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3943) Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time.
Date Thu, 23 Feb 2012 15:38:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214808#comment-13214808
] 

Brandon Williams commented on CASSANDRA-3943:
---------------------------------------------

bq. As the no of nodes in cluster goes increasing, size of each sstable loaded to cassandra
node decreases

This is unavoidable, since the sstable now contains more ranges that belong to specific replica
ranges in the ring.

bq. Such small size sstables take too much time to compact (minor compaction)

Assuming SizeTieredStrategy, increasing the maximum threshold may help this to some degree,
so that the nodes compact more tiny sstables at a time.

bq. Is there any solution to this in existing versions or are you fixing this in future version?

I'm open to ideas, but have no plans as there is no clear solution.
                
> Too many small size sstables after loading data using sstableloader or BulkOutputFormat
increases compaction time.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3943
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3943
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop, Tools
>    Affects Versions: 0.8.2, 1.1.0
>            Reporter: Samarth Gahire
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: bulkloader, hadoop, sstableloader, streaming, tools
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the size of sstables
created is around the buffer size provided.
> But After loading , sstables created in the cluster nodes are of size around
> {code}( (sstable_size_before_loading) * replication_factor ) / No_Of_Nodes_In_Cluster{code}
> As the no of nodes in cluster goes increasing, size of each sstable loaded to cassandra
node decreases.Such small size sstables take too much time to compact (minor compaction) as
compare to relatively large size sstables.
> One solution that we have tried is to increase the buffer size while generating sstables.But
as we increase the buffer size ,time taken to generate sstables increases.Is there any solution
to this in existing versions or are you fixing this in future version?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message