cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kant Kodali <k...@peernova.com>
Subject Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?
Date Sat, 15 Oct 2016 07:15:41 GMT
 compacting 10 sstables each of them have a 15GB partition in what duration?

On Fri, Oct 14, 2016 at 11:45 PM, Matope Ono <matope.ono@gmail.com> wrote:

> Please forget the part in my sentence.
> For more correctly, maybe I should have said like "He could compact 10
> sstables each of them have a 15GB partition".
> What I wanted to say is we can store much more rows(and columns) in a
> partition than before 3.6.
>
> 2016-10-15 15:34 GMT+09:00 Kant Kodali <kant@peernova.com>:
>
>> "Robert said he could treat safely 10 15GB partitions at his presentation"
>> This sounds like there is there is a row limit too not only columns??
>>
>> If I am reading this correctly 10 15GB partitions  means 10 partitions
>> (like 10 row keys,  thats too small) with each partition of size 15GB.
>> (thats like 15 million columns where each column can have a data of size
>> 1KB).
>>
>> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <kant@peernova.com> wrote:
>>
>>> "Robert said he could treat safely 10 15GB partitions at his
>>> presentation" This sounds like there is there is a row limit too not
>>> only columns??
>>>
>>> If I am reading this correctly 10 15GB partitions  means 10 partitions
>>> (like 10 row keys,  thats too small) with each partition of size 15GB.
>>> (thats like 10 million columns where each column can have a data of size
>>> 1KB).
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope.ono@gmail.com>
>>> wrote:
>>>
>>>> Thanks to CASSANDRA-11206, I think we can have much larger partition
>>>> than before 3.6.
>>>> (Robert said he could treat safely 10 15GB partitions at his
>>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY)
>>>>
>>>> But is there still 2B columns limit on the Cassandra code?
>>>> If so, out of curiosity, I'd like to know where the bottleneck is.
>>>> Could anyone let me know about it?
>>>>
>>>> Thanks Yasuharu.
>>>>
>>>>
>>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxguru@gmail.com>:
>>>>
>>>>> The "2 billion column limit" press clipping "puffery". This statement
>>>>> seemingly became popular because highly traffic traffic-ed story, in
which
>>>>> a tech reporter embellished on a statement to make a splashy article.
>>>>>
>>>>> The effect is something like this:
>>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston
>>>>> es-and-the-study-that-never-existed/
>>>>>
>>>>> Iced tea does not cause kidney stones! Cassandra does not store rows
>>>>> with 2 billion columns! It is just not true.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <kant@peernova.com>
>>>>> wrote:
>>>>>
>>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I thought
>>>>>> this is an open ended question as it can involve ideas from everywhere
>>>>>> including the Cassandra java driver mailing lists so sorry If that
bothered
>>>>>> you for some reason.
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.hoxha@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to
>>>>>>> multiple lists in the same message. (based on postgresql mailing
lists
>>>>>>> rules).
>>>>>>> Example I'm not subscribed to those, and now the messages are
>>>>>>> separated.
>>>>>>>
>>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <
>>>>>>> dorian.hoxha@gmail.com> wrote:
>>>>>>>
>>>>>>>> There are some issues working on larger partitions.
>>>>>>>> Hbase doesn't do what you say! You have also to be carefull
on
>>>>>>>> hbase not to create large rows! But since they are globally-sorted,
you can
>>>>>>>> easily sort between them and create small rows.
>>>>>>>>
>>>>>>>> In my opinion, cassandra people are wrong, in that they say
>>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually
use
>>>>>>>> globally-sorted most of the time! You have to be careful
though (just like
>>>>>>>> with random partition)
>>>>>>>>
>>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe
there
>>>>>>>> is a way.
>>>>>>>> The most "recent", means there's a timestamp in there ?
>>>>>>>>
>>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <kant@peernova.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I understand Cassandra can have a maximum of 2B rows
per partition
>>>>>>>>> but in practice some people seem to suggest the magic
number is 100K. why
>>>>>>>>> not create another partition/rowkey automatically (whenever
we reach a safe
>>>>>>>>> limit that  we consider would be efficient)  with auto
increment bigint  as
>>>>>>>>> a suffix appended to the new rowkey? so that the driver
can return the new
>>>>>>>>> rowkey  indicating that there is a new partition and
so on...Now I
>>>>>>>>> understand this would involve allowing partial row key
searches which
>>>>>>>>> currently Cassandra wouldn't do (but I believe HBASE
does) and thinking
>>>>>>>>> about token ranges and potentially many other things..
>>>>>>>>>
>>>>>>>>> My current problem is this
>>>>>>>>>
>>>>>>>>> I have a row key followed by bunch of columns (this is
not time
>>>>>>>>> series data)
>>>>>>>>> and these columns can grow to any number so since I have
100K
>>>>>>>>> limit (or whatever the number is. say some limit) I want
to break the
>>>>>>>>> partition into level/pages
>>>>>>>>>
>>>>>>>>> rowkey1, page1->col1, col2, col3......
>>>>>>>>> rowkey1, page2->col1, col2, col3......
>>>>>>>>>
>>>>>>>>> now say my Cassandra db is populated with data and say
my
>>>>>>>>> application just got booted up and I want to most recent
value of a certain
>>>>>>>>> partition but I don't know which page it belongs to since
my application
>>>>>>>>> just got booted up? how do I solve this in the most efficient
that is
>>>>>>>>> possible in Cassandra today? I understand I can create
MV, other tables
>>>>>>>>> that can hold some auxiliary data such as number of pages
per partition and
>>>>>>>>> so on..but that involves the maintenance cost of that
other table which I
>>>>>>>>> cannot afford really because I have MV's, secondary indexes
for other good
>>>>>>>>> reasons. so it would be great if someone can explain
the best way possible
>>>>>>>>> as of today with Cassandra? By best way I mean is it
possible with one
>>>>>>>>> request? If Yes, then how? If not, then what is the next
best way to solve
>>>>>>>>> this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> kant
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message