cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "qihuang.zheng"<qihuang.zh...@fraudmetrix.cn>
Subject Re: Data.db too large and after sstableloader still large
Date Fri, 13 Nov 2015 01:11:52 GMT
Tks,Rob. We use spark-cassandra-connector to read data from table, then do repartition action.
If some nodes with large file bring out running this tasktoo slow, maybe serveral hours which
is unacceptable.
But those nodes with small file running finished quickly.
So I think if sstableloader can split to small size, and can balance to all nodes, thus our
spark job can running quickly.




Tks,qihuang.zheng


原始邮件
发件人:Robert Colircoli@eventbrite.com
收件人:user@cassandra.apache.orguser@cassandra.apache.org
发送时间:2015年11月13日(周五) 04:04
主题:Re: Data.db too large and after sstableloader still large


On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng qihuang.zheng@fraudmetrix.cn wrote:

question is : why sstableloader can’t balance data file size?


Because it streams ranges from the source SStable to a distributed set of ranges, especially
if you are using vnodes.


It is a general property of Cassandra's streaming that it results in SStables that are likely
different in size than those that result from flush.


Why are you preoccupied with the filesizes of files sized in the hundreds of megabytes? Why
do you care about this amount of variance in file sized?


=Rob
Mime
View raw message