cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7688) Add data sizing to a system table
Date Fri, 06 Feb 2015 07:46:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308699#comment-14308699
] 

mck edited comment on CASSANDRA-7688 at 2/6/15 7:46 AM:
--------------------------------------------------------

{quote}Can you please elaborate on what the idea is behind storing this info in a system table?{quote}
I'm still curious on this question, as it wasn't about the removal of thrift (that's obvious,
although it wasn't obvious that all "metadata" is only exposed via cql, eg ControlConnection.refreshSchema(..))
but around the reasoning for backgrounding/frequency-of the computation. 

{code}        ScheduledExecutors.optionalTasks.schedule(runnable, 5, TimeUnit.MINUTES);{code}
Why 5 minutes? What's the trade-off here? 
 How do we (everyone) know the computation is expensive enough to warrant backgrounding it?
 And that 5 minutes will give us the best throughput (across c* and its hadoop/spark jobs)?

a) what about putting metrics around the code in SizeEstimatesRecorder.run() so we can get
an idea for future adjustments?
(going a step further could be do get updateSizeEstimates() to diff the old rows with new
rows and having a metric on change frequency).

b) what about making the frequency configurable?


was (Author: michaelsembwever):
{quote}Can you please elaborate on what the idea is behind storing this info in a system table?{quote}
I'm still curious on this question, as it wasn't about the removal of thrift (that's obvious)
but around the reasoning for backgrounding the computation.

{code}        ScheduledExecutors.optionalTasks.schedule(runnable, 5, TimeUnit.MINUTES);{code}
Why 5 minutes? What's the trade-off here? 
 How do we (everyone) know the computation is expensive enough to warrant backgrounding it?
 And that 5 minutes will give us the best throughput (across c* and its hadoop/spark jobs)?

a) what about putting metrics around the code in SizeEstimatesRecorder.run() so we can get
an idea for future adjustments?
(going a step further could be do get updateSizeEstimates() to diff the old rows with new
rows and having a metric on change frequency).

b) what about making the frequency configurable?

> Add data sizing to a system table
> ---------------------------------
>
>                 Key: CASSANDRA-7688
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jeremiah Jordan
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.1.3
>
>         Attachments: 7688.txt
>
>
> Currently you can't implement something similar to describe_splits_ex purely from the
a native protocol driver.  https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose
easily getting ownership information to a client in the java-driver.  But you still need the
data sizing part to get splits of a given size.  We should add the sizing information to a
system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message