cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Javier Pareja <pareja.jav...@gmail.com>
Subject Re: 答复: Time serial column family design
Date Tue, 17 Apr 2018 22:00:14 GMT
Hi David,

Could you describe why you chose to include the create date in the
partition key? If the vin in enough "partitioning", meaning that the size
(number of rows x size of row) of each partition is less than 100MB, then
remove the date and just use the create_time, because the date is already
included in that column anyways.

For example if columns "a" and "b" (from your table) are of max 256 UTF8
characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
per partition. You can actually have many more but you don't want to go
much higher for performance reasons.

If this is not enough you could use create_month instead of create_date,
for example, to reduce the partition size while not being too granular.


On Tue, 17 Apr 2018, 22:17 Nate McCall, <nate@thelastpickle.com> wrote:

> Your table design will work fine as you have appropriately bucketed by an
> integer-based 'create_date' field.
>
> Your goal for this refactor should be to remove the "IN" clause from your
> code. This will move the rollup of multiple partition keys being retrieved
> into the client instead of relying on the coordinator assembling the
> results. You have to do more work and add some complexity, but the trade
> off will be much higher performance as you are removing the single
> coordinator as the bottleneck.
>
> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xiangfei.ni@cm-dt.com>
> wrote:
>
>> Hi Nate,
>>
>>     Thanks for your reply!
>>
>>     Is there other way to design this table to meet this requirement?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> 倪项菲*/ **David Ni*
>>
>> 中移德电网络科技有限公司
>>
>> Virtue Intelligent Network Ltd, co.
>>
>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>
>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>
>>
>>
>> *发件人:* Nate McCall <nate@thelastpickle.com>
>> *发送时间:* 2018年4月17日 7:12
>> *收件人:* Cassandra Users <user@cassandra.apache.org>
>> *主题:* Re: Time serial column family design
>>
>>
>>
>>
>>
>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in
>> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>
>> But this cause the cql query is very long,and I don’t know whether there
>> is limitation for the length of the cql.
>>
>> Please give me some advice,thanks in advance.
>>
>>
>>
>> Using the SELECT ... IN syntax  means that:
>>
>> - the driver will not be able to route the queries to the nodes which
>> have the partition
>>
>> - a single coordinator must scatter-gather the query and results
>>
>>
>>
>> Break this up into a series of single statements using the executeAsync
>> method and gather the results via something like Futures in Guava or
>> similar.
>>
>
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Mime
View raw message