incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Mihaylov <n...@nmmm.nu>
Subject Re: Random Distribution, yet Order Preserving Partitioner
Date Fri, 23 Aug 2013 07:48:49 GMT
It can handle some millions of columns, but not more like 10M. I mean, a
request for such a row concentrates on a particular node, so the
performance degrades.

> I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.

works for us with wide row with about 40-50 M, but with lots of problems.

my research with get_count() shows first minor problems at 14-15K columns
in a row and then it just get worse.




On Fri, Aug 23, 2013 at 2:47 AM, Takenori Sato <tsato@cloudian.com> wrote:

> Hi Nick,
>
> > token and key are not same. it was like this long time ago (single MD5
> assumed single key)
>
> True. That reminds me of making a test with the latest 1.2 instead of our
> current 1.0!
>
> > if you want ordered, you probably can arrange your data in a way so you
> can get it in ordered fashion.
>
> Yeah, we have done for a long time. That's called a wide row, right? Or a
> compound primary key.
>
> It can handle some millions of columns, but not more like 10M. I mean, a
> request for such a row concentrates on a particular node, so the
> performance degrades.
>
> > I also had idea for semi-ordered partitioner - instead of single MD5,
> to have two MD5's.
>
> Sounds interesting. But, we need a fully ordered result.
>
> Anyway, I will try with the latest version.
>
> Thanks,
> Takenori
>
>
> On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov <nmmm@nmmm.nu> wrote:
>
>> my five cents -
>> token and key are not same. it was like this long time ago (single MD5
>> assumed single key)
>>
>> if you want ordered, you probably can arrange your data in a way so you
>> can get it in ordered fashion.
>> for example long ago, i had single column family with single key and
>> about 2-3 M columns - I do not suggest you to do it this way, because is
>> wrong way, but it is easy to understand the idea.
>>
>> I also had idea for semi-ordered partitioner - instead of single MD5, to
>> have two MD5's.
>> then you can get semi-ordered ranges, e.g. you get ordered all cities in
>> Canada, all cities in US and so on.
>> however in this way things may get pretty non-ballanced
>>
>> Nick
>>
>>
>>
>>
>>
>> On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <tsato@cloudian.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to implement a custom partitioner that evenly distributes,
>>> yet preserves order.
>>>
>>> The partitioner returns a token by BigInteger as RandomPartitioner does,
>>> while does a decorated key by string as OrderPreservingPartitioner does.
>>> * for now, since IPartitioner<T> does not support different types for
>>> token and key, BigInteger is simply converted to string
>>>
>>> Then, I played around with cassandra-cli. As expected, in my 3 nodes
>>> test cluster, get/set worked, but list(get_range_slices) didn't.
>>>
>>> This came from a challenge to overcome a wide row scalability. So, I
>>> want to make it work!
>>>
>>> I am aware that some efforts are required to make get_range_slices work.
>>> But are there any other critical problems? For example, it seems there is
>>> an assumption that token and key are the same. If this is throughout the
>>> whole C* code, this partitioner is not practical.
>>>
>>> Or have your tried something similar?
>>>
>>> I would appreciate your feedback!
>>>
>>> Thanks,
>>> Takenori
>>>
>>
>>
>

Mime
View raw message