cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Bradberry <rbradbe...@gmail.com>
Subject Re: Cassandra data model right definition
Date Mon, 03 Oct 2016 16:41:20 GMT
"X-store" refers to how data is stored, in almost every case it refers to
what logical constructs are grouped together physically on disk.  It has
nothing to do with whether a database is relational or not.

Cassandra does, in fact meet the definition of row-store, however, I would
like to re-iterate that it goes beyond that and stores all rows for a
single partition together on disk as well.  Therefore row-store does not do
it justice, which is why I like the term "Partitioned row-store"

On Mon, Oct 3, 2016 at 12:37 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> ... and my response can be summed up as "you are not parsing English
> correctly."  The word "like" does not mean what you think it means in this
> context.  It does not mean "close relative."  It is constrained to the
> similarities expressed, and no others.  You don't seem to be reading any of
> my responses about this, though, so I'm not sure parsing is your issue.
>
> Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
> persisting nulls in exactly the same way C* does - encoding their absence
> in the row header.
>
> I empathise with the recent unsubscriber.
>
>
>
> On 3 October 2016 at 15:53, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jon@jonhaddad.com>
>> wrote:
>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxguru@gmail.com>
>>> wrote:
>>>
>>>> Also every piece of techincal information that describes a rowstore
>>>>
>>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>>
>>>> Does it like this:
>>>>
>>>> 001:10,Smith,Joe,40000;
>>>> 002:12,Jones,Mary,50000;
>>>> 003:11,Johnson,Cathy,44000;
>>>> 004:22,Jones,Bob,55000;
>>>>
>>>>
>>>>
>>>> The never depict a scenario where a the data looks like this on disk:
>>>>
>>>> 001:10,Smith
>>>>
>>>> 001:10,40000;
>>>>
>>>> Which is much closer to how Cassandra *stores* it's data.
>>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Absolutely.  A "partitioned row store" is exactly what I would call
>>>> it.  As it happens, our README thinks the same, which is fantastic.
>>>>
>>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>>> before disappointment.  HBase literally calls itself a "
>>>> *column-oriented* store" - which is so totally wrong it's
>>>> simultaneously hilarious and tragic.
>>>>
>>>> I guess we can't blame the wider internet for
>>>> misunderstanding/misnaming us poor "wide column stores" if even one of the
>>>> major examples doesn't know what it, itself, is!
>>>>
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 21:47, Jonathan Haddad <jon@jonhaddad.com>
>>>> wrote:
>>>>
>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>> store" which usually needs some extra explanation but is more accurate than
>>>> "column family" or whatever other thrift era terminology people still use.
>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduyhai@gmail.com>
>>>> wrote:
>>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com>
>>>> wrote:
>>>>
>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>> /system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have a theoritical question:
>>>> - Is Apache Cassandra really a column store?
>>>> Column store mean storing the data as column rather than as a rows.
>>>>
>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>
>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>>> true for you also???
>>>>
>>>> Many thanks in advance for your reply
>>>>
>>>> Best Regards
>>>> Mehdi Bada
>>>> ----
>>>>
>>>> *Mehdi Bada* | Consultant
>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>> 96 15
>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>> mehdi.bada@dbi-services.com
>>>> www.dbi-services.com
>>>>
>>>>
>>>>
>>>>
>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
the
>>>> team
>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

Mime
View raw message