cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
Date Sun, 10 Jun 2012 19:04:43 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292558#comment-13292558
] 

Vijay commented on CASSANDRA-1337:
----------------------------------

{quote}
Can't we reach a state where we have handlers to which we haven't called get() (because they
have not exceeded concurrecy factor?).
{quote}
I dont quite follow the question, are you talking about nodes have not responded on time?
get() method is actually waiting for the nodes to respond with data.
If the above is true, yes we can get to that point and at that point we will might need to
timeout the query.

{quote}
On another matter what would be the best strategy to test this both for correctness and speed?
{quote}
You might want to try Stress tool with different cardinality for the index on a multi node
cluster.

{code}
-C CARDINALITY, --cardinality=CARDINALITY
		Number of unique values stored in columns, default:50
{code}
                
> parallelize fetching rows for low-cardinality indexes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-1337
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
CASSANDRA-1337.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> currently, we read the indexed rows from the first node (in partitioner order); if that
does not have enough matching rows, we read the rows from the next, and so forth.
> we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel,
such that we have a high chance of getting enough rows w/o having to do another round of queries
(but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough
data or we have fetched from each node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message