cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shenghua(Daniel) Wan" <wansheng...@gmail.com>
Subject Re: cqlinputformat and retired cqlpagingingputformat creates lots of connections to query the server
Date Wed, 28 Jan 2015 08:29:16 GMT
I did another experiment to verify indeed 3*257 (1 of 257 ranges is null
effectively) mappers were created.

Thanks mcm for the information !

On Wed, Jan 28, 2015 at 12:17 AM, mck <mck@apache.org> wrote:

> Shenghua,
>
> > The problem is the user might only want all the data via a "select *"
> > like statement. It seems that 257 connections to query the rows are
> necessary.
> > However, is there any way to prohibit 257 concurrent connections?
>
>
> Your reasoning is correct.
> The number of connections should be tunable via the
> "cassandra.input.split.size" property. See
> ConfigHelper.setInputSplitSize(..)
>
> The problem is that vnodes completely trashes this, since splits
> returned don't span across vnodes.
> There's an issue out for this –
> https://issues.apache.org/jira/browse/CASSANDRA-6091
>  but part of the problem is that the thrift stuff involved here is
>  getting rewritten¹ to be pure cql.
>
> In the meantime you override the CqlInputFormat and manually re-merge
> splits together, where location sets match, so to better honour
> inputSplitSize and to return to a more reasonable number of connections.
> We do this, using code similar to this patch
> https://github.com/michaelsembwever/cassandra/pull/2/files
>
> ~mck
>
> ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358
>



-- 

Regards,
Shenghua (Daniel) Wan

Mime
View raw message