cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kant Kodali <k...@peernova.com>
Subject Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?
Date Sat, 15 Oct 2016 06:30:00 GMT
"Robert said he could treat safely 10 15GB partitions at his presentation"
This sounds like there is there is a row limit too not only columns??

If I am reading this correctly 10 15GB partitions  means 10 partitions
(like 10 row keys,  thats too small) with each partition of size 15GB.
(thats like 10 million columns where each column can have a data of size
1KB).





On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope.ono@gmail.com> wrote:

> Thanks to CASSANDRA-11206, I think we can have much larger partition than
> before 3.6.
> (Robert said he could treat safely 10 15GB partitions at his presentation.
> https://www.youtube.com/watch?v=N3mGxgnUiRY)
>
> But is there still 2B columns limit on the Cassandra code?
> If so, out of curiosity, I'd like to know where the bottleneck is. Could
> anyone let me know about it?
>
> Thanks Yasuharu.
>
>
> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxguru@gmail.com>:
>
>> The "2 billion column limit" press clipping "puffery". This statement
>> seemingly became popular because highly traffic traffic-ed story, in which
>> a tech reporter embellished on a statement to make a splashy article.
>>
>> The effect is something like this:
>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston
>> es-and-the-study-that-never-existed/
>>
>> Iced tea does not cause kidney stones! Cassandra does not store rows with
>> 2 billion columns! It is just not true.
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <kant@peernova.com> wrote:
>>
>>> Well 1) I have not sent it to postgresql mailing lists 2) I thought this
>>> is an open ended question as it can involve ideas from everywhere including
>>> the Cassandra java driver mailing lists so sorry If that bothered you for
>>> some reason.
>>>
>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.hoxha@gmail.com>
>>> wrote:
>>>
>>>> Also, I'm not sure, but I don't think it's "cool" to write to multiple
>>>> lists in the same message. (based on postgresql mailing lists rules).
>>>> Example I'm not subscribed to those, and now the messages are separated.
>>>>
>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <dorian.hoxha@gmail.com>
>>>> wrote:
>>>>
>>>>> There are some issues working on larger partitions.
>>>>> Hbase doesn't do what you say! You have also to be carefull on hbase
>>>>> not to create large rows! But since they are globally-sorted, you can
>>>>> easily sort between them and create small rows.
>>>>>
>>>>> In my opinion, cassandra people are wrong, in that they say "globally
>>>>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>>>>> most of the time! You have to be careful though (just like with random
>>>>> partition)
>>>>>
>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is
>>>>> a way.
>>>>> The most "recent", means there's a timestamp in there ?
>>>>>
>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <kant@peernova.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I understand Cassandra can have a maximum of 2B rows per partition
>>>>>> but in practice some people seem to suggest the magic number is 100K.
why
>>>>>> not create another partition/rowkey automatically (whenever we reach
a safe
>>>>>> limit that  we consider would be efficient)  with auto increment
bigint  as
>>>>>> a suffix appended to the new rowkey? so that the driver can return
the new
>>>>>> rowkey  indicating that there is a new partition and so on...Now
I
>>>>>> understand this would involve allowing partial row key searches which
>>>>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>>>>>> about token ranges and potentially many other things..
>>>>>>
>>>>>> My current problem is this
>>>>>>
>>>>>> I have a row key followed by bunch of columns (this is not time
>>>>>> series data)
>>>>>> and these columns can grow to any number so since I have 100K limit
>>>>>> (or whatever the number is. say some limit) I want to break the partition
>>>>>> into level/pages
>>>>>>
>>>>>> rowkey1, page1->col1, col2, col3......
>>>>>> rowkey1, page2->col1, col2, col3......
>>>>>>
>>>>>> now say my Cassandra db is populated with data and say my application
>>>>>> just got booted up and I want to most recent value of a certain partition
>>>>>> but I don't know which page it belongs to since my application just
got
>>>>>> booted up? how do I solve this in the most efficient that is possible
in
>>>>>> Cassandra today? I understand I can create MV, other tables that
can hold
>>>>>> some auxiliary data such as number of pages per partition and so
on..but
>>>>>> that involves the maintenance cost of that other table which I cannot
>>>>>> afford really because I have MV's, secondary indexes for other good
>>>>>> reasons. so it would be great if someone can explain the best way
possible
>>>>>> as of today with Cassandra? By best way I mean is it possible with
one
>>>>>> request? If Yes, then how? If not, then what is the next best way
to solve
>>>>>> this?
>>>>>>
>>>>>> Thanks,
>>>>>> kant
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message