cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?
Date Sat, 15 Oct 2016 11:26:39 GMT
"2) so what is optimal limit in terms of data size?"

--> Usual recommendations for Cassandra 2.1 are:

a. max 100Mb per partition size
b. or up to 10 000 000 physical columns for a partition (including
clustering columns etc ...)

Recently, with the work of Robert Stupp (CASSANDRA-11206) and also with the
huge enhancement from Michael Kjellman (CASSANDRA-9754) it will be easier
to handle huge partition in memory, especially with a reduce memory
footprint with regards to the JVM heap.

However, as long as we don't have repair and streaming processes that can
be "resumed" in a middle of a partition, the operational pains will still
be there. Same for compaction



On Sat, Oct 15, 2016 at 12:00 PM, Kant Kodali <kant@peernova.com> wrote:

> 1) It will be great if someone can confirm that there is no limit
> 2) so what is optimal limit in terms of data size?
>
> Finally, Thanks a lot for pointing out all the operational issues!
>
> On Sat, Oct 15, 2016 at 2:39 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>> "But is there still 2B columns limit on the Cassandra code?"
>>
>> --> I remember some one the committer saying that this 2B columns
>> limitation comes from the Thrift era where you're limited to max  2B
>> columns to be returned to the client for each request. It also applies to
>> the max size of each "page" of data
>>
>> Since the introduction of the binary protocol and the paging feature,
>> this limitation does not make sense anymore.
>>
>> By the way, if your partition is too wide, you'll face other operational
>> issues way before reaching the 2B columns limit:
>>
>> - compaction taking looooong time --> heap pressure --> long GC pauses
>> --> nodes flapping
>> - repair & over-streaming, repair session failure in the middle that
>> forces you to re-send the whole big partition --> the receiving node has a
>> bunch of duplicate data --> pressure on compaction
>> - bootstrapping of new nodes --> failure to stream a partition in the
>> middle will force to re-send the whole partition from the beginning again -->
>> the receiving node has a bunch of duplicate data --> pressure on compaction
>>
>>
>>
>> On Sat, Oct 15, 2016 at 9:15 AM, Kant Kodali <kant@peernova.com> wrote:
>>
>>>  compacting 10 sstables each of them have a 15GB partition in what
>>> duration?
>>>
>>> On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope.ono@gmail.com>
>>> wrote:
>>>
>>>> Please forget the part in my sentence.
>>>> For more correctly, maybe I should have said like "He could compact 10
>>>> sstables each of them have a 15GB partition".
>>>> What I wanted to say is we can store much more rows(and columns) in a
>>>> partition than before 3.6.
>>>>
>>>> 2016-10-15 15:34 GMT+09:00 Kant Kodali <kant@peernova.com>:
>>>>
>>>>> "Robert said he could treat safely 10 15GB partitions at his
>>>>> presentation" This sounds like there is there is a row limit too not
>>>>> only columns??
>>>>>
>>>>> If I am reading this correctly 10 15GB partitions  means 10 partitions
>>>>> (like 10 row keys,  thats too small) with each partition of size 15GB.
>>>>> (thats like 15 million columns where each column can have a data of size
>>>>> 1KB).
>>>>>
>>>>> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <kant@peernova.com>
>>>>> wrote:
>>>>>
>>>>>> "Robert said he could treat safely 10 15GB partitions at his
>>>>>> presentation" This sounds like there is there is a row limit too
not
>>>>>> only columns??
>>>>>>
>>>>>> If I am reading this correctly 10 15GB partitions  means 10
>>>>>> partitions (like 10 row keys,  thats too small) with each partition
of size
>>>>>> 15GB. (thats like 10 million columns where each column can have a
data of
>>>>>> size 1KB).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope.ono@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks to CASSANDRA-11206, I think we can have much larger partition
>>>>>>> than before 3.6.
>>>>>>> (Robert said he could treat safely 10 15GB partitions at his
>>>>>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY)
>>>>>>>
>>>>>>> But is there still 2B columns limit on the Cassandra code?
>>>>>>> If so, out of curiosity, I'd like to know where the bottleneck
is.
>>>>>>> Could anyone let me know about it?
>>>>>>>
>>>>>>> Thanks Yasuharu.
>>>>>>>
>>>>>>>
>>>>>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxguru@gmail.com>:
>>>>>>>
>>>>>>>> The "2 billion column limit" press clipping "puffery". This
>>>>>>>> statement seemingly became popular because highly traffic
traffic-ed story,
>>>>>>>> in which a tech reporter embellished on a statement to make
a splashy
>>>>>>>> article.
>>>>>>>>
>>>>>>>> The effect is something like this:
>>>>>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston
>>>>>>>> es-and-the-study-that-never-existed/
>>>>>>>>
>>>>>>>> Iced tea does not cause kidney stones! Cassandra does not
store
>>>>>>>> rows with 2 billion columns! It is just not true.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <kant@peernova.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well 1) I have not sent it to postgresql mailing lists
2) I
>>>>>>>>> thought this is an open ended question as it can involve
ideas from
>>>>>>>>> everywhere including the Cassandra java driver mailing
lists so sorry If
>>>>>>>>> that bothered you for some reason.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <
>>>>>>>>> dorian.hoxha@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Also, I'm not sure, but I don't think it's "cool"
to write to
>>>>>>>>>> multiple lists in the same message. (based on postgresql
mailing lists
>>>>>>>>>> rules).
>>>>>>>>>> Example I'm not subscribed to those, and now the
messages are
>>>>>>>>>> separated.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <
>>>>>>>>>> dorian.hoxha@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> There are some issues working on larger partitions.
>>>>>>>>>>> Hbase doesn't do what you say! You have also
to be carefull on
>>>>>>>>>>> hbase not to create large rows! But since they
are globally-sorted, you can
>>>>>>>>>>> easily sort between them and create small rows.
>>>>>>>>>>>
>>>>>>>>>>> In my opinion, cassandra people are wrong, in
that they say
>>>>>>>>>>> "globally sorted is the devil!" while all fb/google/etc
actually use
>>>>>>>>>>> globally-sorted most of the time! You have to
be careful though (just like
>>>>>>>>>>> with random partition)
>>>>>>>>>>>
>>>>>>>>>>> Can you tell what rowkey1, page1, col(x) actually
are ? Maybe
>>>>>>>>>>> there is a way.
>>>>>>>>>>> The most "recent", means there's a timestamp
in there ?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali
<kant@peernova.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I understand Cassandra can have a maximum
of 2B rows per
>>>>>>>>>>>> partition but in practice some people seem
to suggest the magic number is
>>>>>>>>>>>> 100K. why not create another partition/rowkey
automatically (whenever we
>>>>>>>>>>>> reach a safe limit that  we consider would
be efficient)  with auto
>>>>>>>>>>>> increment bigint  as a suffix appended to
the new rowkey? so that the
>>>>>>>>>>>> driver can return the new rowkey  indicating
that there is a new partition
>>>>>>>>>>>> and so on...Now I understand this would involve
allowing partial row key
>>>>>>>>>>>> searches which currently Cassandra wouldn't
do (but I believe HBASE does)
>>>>>>>>>>>> and thinking about token ranges and potentially
many other things..
>>>>>>>>>>>>
>>>>>>>>>>>> My current problem is this
>>>>>>>>>>>>
>>>>>>>>>>>> I have a row key followed by bunch of columns
(this is not time
>>>>>>>>>>>> series data)
>>>>>>>>>>>> and these columns can grow to any number
so since I have 100K
>>>>>>>>>>>> limit (or whatever the number is. say some
limit) I want to break the
>>>>>>>>>>>> partition into level/pages
>>>>>>>>>>>>
>>>>>>>>>>>> rowkey1, page1->col1, col2, col3......
>>>>>>>>>>>> rowkey1, page2->col1, col2, col3......
>>>>>>>>>>>>
>>>>>>>>>>>> now say my Cassandra db is populated with
data and say my
>>>>>>>>>>>> application just got booted up and I want
to most recent value of a certain
>>>>>>>>>>>> partition but I don't know which page it
belongs to since my application
>>>>>>>>>>>> just got booted up? how do I solve this in
the most efficient that is
>>>>>>>>>>>> possible in Cassandra today? I understand
I can create MV, other tables
>>>>>>>>>>>> that can hold some auxiliary data such as
number of pages per partition and
>>>>>>>>>>>> so on..but that involves the maintenance
cost of that other table which I
>>>>>>>>>>>> cannot afford really because I have MV's,
secondary indexes for other good
>>>>>>>>>>>> reasons. so it would be great if someone
can explain the best way possible
>>>>>>>>>>>> as of today with Cassandra? By best way I
mean is it possible with one
>>>>>>>>>>>> request? If Yes, then how? If not, then what
is the next best way to solve
>>>>>>>>>>>> this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> kant
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message