cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12153) RestrictionSet.hasIN() is slow
Date Fri, 08 Jul 2016 19:29:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368229#comment-15368229
] 

Benjamin Lerer edited comment on CASSANDRA-12153 at 7/8/16 7:28 PM:
--------------------------------------------------------------------

I am the one to blame for the {{stream()}} method. My main concern, when I created it, was
just to simplify the code.
If we are really looking for speed, I think that we should have some field variables for {{hasIN}},
{{hasEq}} ...
It will move the computation at preparation time rather than at execution time and will perform
it only once (if my memory is correct {{hasIN()}} is called multiple times).

bq. Then remove RestrictionSet stream() to discourage this from being reintroduced?

There is 2 problems associated to the {{stream()}} method. The creation of the {{LinkedHashSet}}
which is used to remove the {{MultiColumnRestriction}} duplicates and the Lambda expressions.
The {{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so removing {{stream()}}
will not solve that problem. 
I think, we could keep track of the fact that multicolumn restrictions are used or not and
avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.


was (Author: blerer):
I am the one to blame for the {{stream()}} method. My main concern, when I created it, was
just to simplify the code.
If we are really looking for speed, I think that we should have some field variables for {{hasIN}},
{{hasEq}} ...
It will move the computation at preparation time rather than at execution time and will perform
it only once (if my memory is correct {{hasIN()}} is called multiple times).

bq. Then remove RestrictionSet stream() to discourage this from being reintroduced?

There is 2 problems associated to the {{stream()}} method. The creation of the {{LinkedHashSet}}
which is used to remove the duplicates {{MultiColumnRestrictions}} and the Lambda expressions.
The {{{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so removing {{stream()}
will not solve that problem. 
I think, we could keep track of the fact that multicolumn restrictions are used or not and
avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.

> RestrictionSet.hasIN() is slow
> ------------------------------
>
>                 Key: CASSANDRA-12153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that {{RestrictionSet.hasIN()}}
was responsible for about 1% of the time.  It looks like it's mostly slow because it creates
a new LinkedHashSet (which is expensive to init) and uses streams.  This can be replaced with
a simple for loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message