cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cassandra data model right definition
Date Sat, 01 Oct 2016 15:31:58 GMT
https://github.com/apache/cassandra

Row store <http://wiki.apache.org/cassandra/DataModel> means that like
relational databases, Cassandra organizes data by rows and columns. The
Cassandra Query Language (CQL) is a close relative of SQL.

I generally do not know what to say about these high level
"oversimplifications" like "firewalls block hackers". Are there "firewalls"
or do they mean IP routers with layer 4 packet inspections and layer 3
Access Control Lists?

We say (and I catch myself doing it all the time) "like relational
databases" often as if all relational databases work alike. A columnar
store like HP Vertica is a relational database.MySql has different storage
engines does MyIsam work like InnoDB?

Google docs organizes data by rows and columns as well. You can wrap any
storage system into an API that makes them look like rows and columns.
Microsoft LINQ can enumerate your network cars and query them
https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
not make your network cards a "row store"

"Theoretically a row can have 2 billion columns, but in practice it
shouldn't have more than 100 million columns."
In practice (In my experience) the number is much lower than 100 million,
and if the data actually is deleted and readded frequently the number of
live columns(rows, whatever) you can use happily is even lower


I believe on twitter (I am unable to find the tweet) someone was trying to
convince me Cassandra was a "columnar analytic database".  ROFL

I believe telling someone it "row store" "like a database", is not a good
idea. They might away content with that explanation. You are setting them
up to walk into an anti-pattern. Like a case where the user is attempting
to write and deleting 1 row and 1 column 6 billion times a day. Then you
end up explaining to them
http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached


and how the cassandra storage model is not "like a relational database".

On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:

> I can iterate over JSON data stored in mongo and present it as a table
> with rows and columns. It does not make mongo a rowstore.
>
> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
>> The problem with calling it a row store:
>>
>> https://en.wikipedia.org/wiki/Row_(database)
>>
>> In the context of a relational database
>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called
>> a record <https://en.wikipedia.org/wiki/Record_(computer_science)> or
>> tuple <https://en.wikipedia.org/wiki/Tuple>—represents a single,
>> implicitly structured data <https://en.wikipedia.org/wiki/Data> item in
>> a table <https://en.wikipedia.org/wiki/Table_(database)>. In simple
>> terms, a database table can be thought of as consisting of *rows* and
>> columns <https://en.wikipedia.org/wiki/Column_(database)> or fields
>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
>> table represents a set of related data, and every row in the table has the
>> same structure.
>>
>> When you have static columns and rows with maps, and lists, it is hard to
>> argue that every row has the same structure. Physically at the storage
>> layer they do not have the same structure and logically when accessing the
>> data they barely have the same structure, as the static column is just
>> appearing inside each row it is actually not contained in.
>>
>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jon@jonhaddad.com>
>> wrote:
>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduyhai@gmail.com>
>>> wrote:
>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>> Only thrift users no longer think they have a schema (though they do),
and
>>>>> thrift is being deprecated.
>>>>>
>>>>> I really wish everyone would kill the term "wide column store" with
>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>
>>>>> Not only that, but people don't even seem to realise the term "column
>>>>> store" existed long before "wide column store" and the latter is often
>>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>>> /what-is-nosql/
>>>>>
>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>> this awful nomenclature ever existed.
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>> joaquin@thelastpickle.com> wrote:
>>>>>
>>>>>> Hi Mehdi,
>>>>>>
>>>>>> I can help clarify a few things.
>>>>>>
>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a
row
>>>>>> can have 2 billion columns, but in practice it shouldn't have more
than 100
>>>>>> million columns.
>>>>>>
>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>> keys. Together, the partition key(s) and clustering key(s) form the
primary
>>>>>> key.
>>>>>>
>>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>>> key, however, when reading from Cassandra, you only need to provide
the
>>>>>> full partition key.
>>>>>>
>>>>>> When you only provide the partition key for a read operation, you're
>>>>>> able to return all columns that exist on that partition with low
latency.
>>>>>> These columns are displayed as "CQL rows" to make it easier to reason
about.
>>>>>>
>>>>>> Consider the schema:
>>>>>>
>>>>>> CREATE TABLE foo (
>>>>>>   bar uuid,
>>>>>>
>>>>>>   boz uuid,
>>>>>>
>>>>>>   baz timeuuid,
>>>>>>   data1 text,
>>>>>>
>>>>>>   data2 text,
>>>>>>
>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>
>>>>>> );
>>>>>>
>>>>>>
>>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>>> and optionally data*, if it's relevant for that CQL row. If you chose
not
>>>>>> to define a data* field for a particular CQL row, then nothing is
stored
>>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>> "schema-less".
>>>>>>
>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>> Cassandra replica set (a configurable number of nodes) and be stored
on the
>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field
that's
>>>>>> not a partition key is stored as a column, including clustering keys
(this
>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>
>>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>>> either over time, or for a specific time, with roughly the same number
of
>>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>>
>>>>>> Hope that helps!
>>>>>>
>>>>>> Joaquin Casares
>>>>>> Consultant
>>>>>> Austin, TX
>>>>>>
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>> /system/Cassandra
>>>>>>>
>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>> <https://twitter.com/calonso>
>>>>>>>
>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a theoritical question:
>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>> Column store mean storing the data as column rather than
as a rows.
>>>>>>>>
>>>>>>>> In fact C* store the data as row, and data is partionned
with row
>>>>>>>> key.
>>>>>>>>
>>>>>>>> Finally, for me, Cassandra is a row oriented schema less
DBMS....
>>>>>>>> Is it true for you also???
>>>>>>>>
>>>>>>>> Many thanks in advance for your reply
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Mehdi Bada
>>>>>>>> ----
>>>>>>>>
>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax:
+41 32
>>>>>>>> 422 96 15
>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>> www.dbi-services.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts
! – Join
>>>>>>>> the team
>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Mime
View raw message