cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4478) Make index_interval be measured in kb (instead of number of keys)
Date Tue, 27 Nov 2012 08:59:57 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504465#comment-13504465
] 

Sylvain Lebresne commented on CASSANDRA-4478:
---------------------------------------------

bq. What if instead we make index_interval be CQL3 rows instead of partitions?

I'm not sure I see much benefit of that over measuring it in bytes. Namely:
# that doesn't make tuning easier. What the index_interval represent is how much of the index
file you will need to read at maximum to find the indexed block you are looking for. So it
does fell like to me that having this size in bytes is *ideal*. In particular, even if CQL3
rows vary less in size than internal ones, they are still not constant in size depending on
the table.
# it will be more complicated/less efficient to implement in practice with the current code
because the index summary is built from the index file. But the index file doesn't have enough
information currently to count cql3 rows.
# a cql3 row count might be fairly meaningless for thrift users. 
# currently we still have 2 nested level of indexing, the internal rows and inside that, the
column index. They do are in the same file now, but they are not merged together. In that
situation, I'm not really sure counting cql3 rows make any sense in fact (of course, we could
merge the two level of indexing together, but that's not a small/simple patch while this ticket
is more straightforward while still putting us in a situation this is probably good enough
for a while). 
                
> Make index_interval be measured in kb (instead of number of keys)
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-4478
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4478
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 4478-incomplete.txt
>
>
> Currently, index_interval is measured in number of keys: how may keys before adding an
entry to the index summary. After CASSANDRA-2319, each index entry also contains the columns
index for the row, so index entry can be a bit bigger and of differing sizes. Measuring in
number of keys is thus sub-optimal and difficult to tune, since you might want a different
setting depending of whether your rows are big or small, but the setting is global.
> So we should move to measuring the interval in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message