cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Liu (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7280) Hadoop support not respecting cassandra.input.split.size
Date Tue, 14 Oct 2014 19:31:34 GMT


Alex Liu commented on CASSANDRA-7280:

cassandra.input.split.size is used to partition rows by partitioning key. It doesn't affect
native paging. Native internal paging has a page size which can be set by ""

> Hadoop support not respecting cassandra.input.split.size
> --------------------------------------------------------
>                 Key: CASSANDRA-7280
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Jeremy Hanna
> Long ago (0.7), I tried to set the cassandra.input.split.size property and never really
got it to respect that property.  However the batch size was useful for what I needed to affect
the timeouts.
> Now with the cql record reader and the native paging, users can specify queries potentially
using allow filtering clauses.  The input split size is more important because the server
may have to scan through many many records to get matching records.  If the user can effectively
set the input split size, then that gives a hard limit on how many records it will traverse.
> Currently it appears to be overriding the property, perhaps in the client.describe_splits_ex
method on the server side.
> It can be argued that users shouldn't be using allow filtering clauses in their cql in
the first place.  However it is still a bug that the input split size is not honored.

This message was sent by Atlassian JIRA

View raw message