cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
Date Fri, 15 Oct 2010 14:49:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921390#action_12921390
] 

Stu Hood edited comment on CASSANDRA-1600 at 10/15/10 10:47 AM:
----------------------------------------------------------------

> we already have this problem with the existing get_range_slices and excessively large
count values.
In that case, the user is explicitly saying, give me a lot of stuff. The fix (in production)
would be a one line code change, not the emergency addition of an index.

> it turns out that allowing people to do more powerful/efficient things is the right choice
even when it is potentially dangerous
I disagree. http://jsomers.net/blog/it-turns-out

If we're talking about the specific case of adhoc analytics queries, then we should discuss
them independently, because they really are a whole different beast. For instance, if the
idea here is that you would perform filtering in Cassandra rather than in the Hadoop process,
you are not saving anything but ser/de time, since the recommended way to deploy Hadoop is
directly on localhost.

      was (Author: stuhood):
    > we already have this problem with the existing get_range_slices and excessively large
count values.
In the case, the user is explicitly saying, give me a lot of stuff. The fix (in production)
would be a one line code change, not the emergency addition of an index.

> it turns out that allowing people to do more powerful/efficient things is the right choice
even when it is potentially dangerous
I disagree. http://jsomers.net/blog/it-turns-out

If we're talking about the specific case of adhoc analytics queries, then we should discuss
them independently, because they really are a whole different beast. For instance, if the
idea here is that you would perform filtering in Cassandra rather than in the Hadoop process,
you are not saving anything but ser/de time, since the recommended way to deploy Hadoop is
directly on localhost.
  
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
>                 Key: CASSANDRA-1600
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>             Fix For: 0.7.0
>
>         Attachments: 0001-Add-optional-IndexClause-to-KeyRange-and-serialize-w.patch,
0002-Drop-the-IndexClause.count-parameter.patch, 0003-Execute-RangeSliceCommands-using-scan-when-an-IndexC.patch,
0004-Remove-get_indexed_slices-method.patch, 0005-Update-system-tests-to-use-get_range_slices.patch,
0006-Remove-start_key-from-IndexClause-for-the-start_key-.patch, 0007-Respect-end_key-for-filtered-queries.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning
behind using 'KeyRange' for get_range_slices applies there as well, since if you know the
range you care about in the primary index, you don't want to continue scanning until you exhaust
'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange, why not
smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message