cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Cassandra data model right definition
Date Mon, 03 Oct 2016 16:45:37 GMT
I've met clients that read the cassandra docs and then said in a big
meeting "it's just like relational database, it has tables just like
sqlserver/oracle."

I'm not putting words in other people's mouth either, but I've heard that
said enough times to want to puke. Does the docs claim cassandra is
relational ? it absolutely doesn't make that claim, but the docs play
loosey goosey with terminology. End result is it confuses new users that
aren't experts, or technology managers that try to make a case for
cassandra.

we can make all the excuses we want, but that doesn't change the fact the
docs aren't user friendly. writing great documentation is tough and most
developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
writing user friendly documentation". As many people have pointed out, it's
not unique to Cassandra. 80% of the tech docs out there suck, starting with
IBM at the top.

Saying the docs suck isn't an indictment of anyone, it's just the reality
of writing good documentation.

On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>>
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jon@jonhaddad.com>
>> wrote:
>>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <jon@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduyhai@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Mime
View raw message