cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joaquin Casares <joaq...@thelastpickle.com>
Subject Re: Cassandra data model right definition
Date Fri, 30 Sep 2016 17:09:10 GMT
Hi Mehdi,

I can help clarify a few things.

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
have 2 billion columns, but in practice it shouldn't have more than 100
million columns.

Cassandra partitions data to certain nodes based on the partition key(s),
but does provide the option of setting zero or more clustering keys.
Together, the partition key(s) and clustering key(s) form the primary key.

When writing to Cassandra, you will need to provide the full primary key,
however, when reading from Cassandra, you only need to provide the full
partition key.

When you only provide the partition key for a read operation, you're able
to return all columns that exist on that partition with low latency. These
columns are displayed as "CQL rows" to make it easier to reason about.

Consider the schema:

CREATE TABLE foo (
  bar uuid,

  boz uuid,

  baz timeuuid,
  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);


When you write to Cassandra you will need to send bar, boz, and baz and
optionally data*, if it's relevant for that CQL row. If you chose not to
define a data* field for a particular CQL row, then nothing is stored nor
allocated on disk. But I wouldn't consider that caveat to be "schema-less".

However, all writes to the same bar/boz will end up on the same Cassandra
replica set (a configurable number of nodes) and be stored on the same
place(s) on disk within the SSTable(s). And on disk, each field that's not
a partition key is stored as a column, including clustering keys (this is
optimized in Cassandra 3+, but now we're getting deep into internals).

In this way you can get fast responses for all activity for bar/boz either
over time, or for a specific time, with roughly the same number of disk
seeks, with varying lengths on the disk scans.

Hope that helps!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com> wrote:

> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com>
> wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>
>

Mime
View raw message