cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cassandra data model right definition
Date Fri, 30 Sep 2016 20:39:54 GMT
Then:
Physically: A data store which physically structured-log-merge of SSTables
(see) https://cloud.google.com/bigtable/.
Now:
One of the change made in Apache Cassandra 3.0 is a relatively
important refactor
of the storage engine <https://issues.apache.org/jira/browse/CASSANDRA-8099>.
I say refactor because the basics have not changed: data is still inserted
in a memtable which get flushed over time to a sstable with compaction
baby-sitting the set of sstables on disk, and reads uses both memtable and
sstables to retrieve results. But the internal structure of the objects
manipulated in those phases has changed, and that entails a significant
amount of refactoring in the code. The principal motivation is that new
storage engine more directly manipulate the structure that is exposed
through CQL, and knowing that structure at the storage engine level has
many advantages: some features are easier to add and the engine has more
information to optimize.

http://www.datastax.com/2015/12/storage-engine-30

Then:
An RPC abstraction over he data with methods like get_slice which selected
columns from a single 'row key'
Now:
A Query based abstraction over the data with queries like SELECT * FROM
table WHERE x=y in which most language features works over single
'partitions'

And 3? implementations of secondary index like things:
Secondary Indexes
Materialized Views
SasiIndex

Which add to query functionality typically by storing an index (or
secondary form) in a way optimized for given query functionality.






On Fri, Sep 30, 2016 at 1:52 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.org
>> /what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com>
>>> wrote:
>>>
>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>> /system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a theoritical question:
>>>>> - Is Apache Cassandra really a column store?
>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>
>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>
>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>> it true for you also???
>>>>>
>>>>> Many thanks in advance for your reply
>>>>>
>>>>> Best Regards
>>>>> Mehdi Bada
>>>>> ----
>>>>>
>>>>> *Mehdi Bada* | Consultant
>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>> 96 15
>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>> mehdi.bada@dbi-services.com
>>>>> www.dbi-services.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! –
Join the
>>>>> team
>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message