cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cassandra data model right definition
Date Mon, 03 Oct 2016 13:40:03 GMT
Also every piece of techincal information that describes a rowstore

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems

Does it like this:

001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;



The never depict a scenario where a the data looks like this on disk:

001:10,Smith

001:10,40000;

Which is much closer to how Cassandra *stores* it's data.



On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <jon@jonhaddad.com> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduyhai@gmail.com> wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than
100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason
about.
>>>>>
>>>>> Consider the schema:
>>>>>
>>>>> CREATE TABLE foo (
>>>>>   bar uuid,
>>>>>
>>>>>   boz uuid,
>>>>>
>>>>>   baz timeuuid,
>>>>>   data1 text,
>>>>>
>>>>>   data2 text,
>>>>>
>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>
>>>>> );
>>>>>
>>>>>
>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>> and optionally data*, if it's relevant for that CQL row. If you chose
not
>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>> "schema-less".
>>>>>
>>>>> However, all writes to the same bar/boz will end up on the same
>>>>> Cassandra replica set (a configurable number of nodes) and be stored
on the
>>>>> same place(s) on disk within the SSTable(s). And on disk, each field
that's
>>>>> not a partition key is stored as a column, including clustering keys
(this
>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>
>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>> either over time, or for a specific time, with roughly the same number
of
>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> Joaquin Casares
>>>>> Consultant
>>>>> Austin, TX
>>>>>
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>> /system/Cassandra
>>>>>>
>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>> <https://twitter.com/calonso>
>>>>>>
>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a theoritical question:
>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>> Column store mean storing the data as column rather than as a
rows.
>>>>>>>
>>>>>>> In fact C* store the data as row, and data is partionned with
row
>>>>>>> key.
>>>>>>>
>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS....
Is
>>>>>>> it true for you also???
>>>>>>>
>>>>>>> Many thanks in advance for your reply
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Mehdi Bada
>>>>>>> ----
>>>>>>>
>>>>>>> *Mehdi Bada* | Consultant
>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41
32
>>>>>>> 422 96 15
>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>> mehdi.bada@dbi-services.com
>>>>>>> www.dbi-services.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts
! – Join
>>>>>>> the team
>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message