cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
Date Mon, 01 Dec 2014 14:21:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229828#comment-14229828
] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
-----------------------------------------------

We only need estimates, not exact values. Factor 1.5x error is considered an awesome estimate,
factor 3x is still fairly good. 
Also note that Spark/Hadoop does many token range scans. Maybe collecting some statistics
on the fly, during the scans (or during the compaction) would be viable?  And running a full
compaction to get statistics more accurate - why not? You need to do it anyway to get top
speed when scanning data in Spark, because a full table scan is doing kind-of implicit compaction
anyway, isn't it? 

Also, one more thing - it would be good to have those values per column (sorry for making
it even harder, I know it is not an easy task). At least to know that a column is responsible
for xx% of data in the table - knowing such thing would make a huge difference when estimating
data size, because we're not always fetching all columns and they may vary in size a lot (e.g.
collections!). Some sampling on insert would probably be enough.


> Add data sizing to a system table
> ---------------------------------
>
>                 Key: CASSANDRA-7688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jeremiah Jordan
>             Fix For: 2.1.3
>
>
> Currently you can't implement something similar to describe_splits_ex purely from the
a native protocol driver.  https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose
easily getting ownership information to a client in the java-driver.  But you still need the
data sizing part to get splits of a given size.  We should add the sizing information to a
system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message