cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shenghua(Daniel) Wan" <wansheng...@gmail.com>
Subject Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server
Date Thu, 29 Jan 2015 02:47:48 GMT
That's c* default setting. My version is 2.0.11. Check your Cassandra.yaml.
On Jan 28, 2015 4:53 PM, "Huiliang Zhang" <zhlntu@gmail.com> wrote:

> If you are using replication factor 1 and 3 cassandra nodes, 256 virtual
> nodes should be evenly distributed on 3 nodes. So there are totally 256
> virtual nodes. But in your experiment, you saw 3*257 mapper. Is that
> because of the setting cassandra.input.split.size=3? It is nothing with
> node number=3. Otherwise, I am confused why there are 256 virtual nodes on
> every cassandra node.
>
> On Wed, Jan 28, 2015 at 12:29 AM, Shenghua(Daniel) Wan <
> wanshenghua@gmail.com> wrote:
>
>> I did another experiment to verify indeed 3*257 (1 of 257 ranges is null
>> effectively) mappers were created.
>>
>> Thanks mcm for the information !
>>
>> On Wed, Jan 28, 2015 at 12:17 AM, mck <mck@apache.org> wrote:
>>
>>> Shenghua,
>>>
>>> > The problem is the user might only want all the data via a "select *"
>>> > like statement. It seems that 257 connections to query the rows are
>>> necessary.
>>> > However, is there any way to prohibit 257 concurrent connections?
>>>
>>>
>>> Your reasoning is correct.
>>> The number of connections should be tunable via the
>>> "cassandra.input.split.size" property. See
>>> ConfigHelper.setInputSplitSize(..)
>>>
>>> The problem is that vnodes completely trashes this, since splits
>>> returned don't span across vnodes.
>>> There's an issue out for this –
>>> https://issues.apache.org/jira/browse/CASSANDRA-6091
>>>  but part of the problem is that the thrift stuff involved here is
>>>  getting rewritten¹ to be pure cql.
>>>
>>> In the meantime you override the CqlInputFormat and manually re-merge
>>> splits together, where location sets match, so to better honour
>>> inputSplitSize and to return to a more reasonable number of connections.
>>> We do this, using code similar to this patch
>>> https://github.com/michaelsembwever/cassandra/pull/2/files
>>>
>>> ~mck
>>>
>>> ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358
>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Shenghua (Daniel) Wan
>>
>
>

Mime
View raw message