cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christof Bornhoevd <cbornho...@gmail.com>
Subject Unsubscribe
Date Mon, 03 Oct 2016 15:08:54 GMT
Unsubscribe

On Monday, October 3, 2016, Benedict Elliott Smith <benedict@apache.org>
wrote:

> While that sentence leaves a lot to be desired (for me because it confers
> a different meaning on row store), it doesn't say "Cassandra is like a
> RDBMS" - it says "like an RDBMS, it organises data by rows and columns" -
> i.e., in this regard only it is like an RDBMS, not more generally.
>
> I believe it was meant to help people, especially those afraid of the
> NoSQL thrift world, understand that it still uses the basic concept of a
> rows and columns they are used to.  I agree it could be improved to
> minimise the chance of misreading it, and I'm certain contributions would
> be welcome here.
>
> I don't personally want to get bogged down in analysing every piece of
> text anyone has ever written, so I'll bow out of further discussion on
> this.  These phrases may all be suboptimal, but they are certainly
> defensible.  Column store is not, that's all I wanted to contribute here.
>
>
>
>
>
> On 1 October 2016 at 19:35, Peter Lin <woolfel@gmail.com
> <javascript:_e(%7B%7D,'cvml','woolfel@gmail.com');>> wrote:
>
>> I'll second Ed's comment.
>>
>> The documentation should be more careful when using phrases "like
>> relational databases". When we look at the history of relational databases,
>> people expect certain things like ACID transactions, primary/foriegn key
>> constraints, query planners, joins and relational algebra. Clearly
>> Cassandra's storage engine does not follow most of those principals for a
>> good reason.
>>
>> The term row oriented storage would be more descriptive and appropriate.
>> It avoids conflating Cassandra storage engine with "traditional" relational
>> storage engines. Those of us that have spent over a decade using IBM DB2,
>> Oracle, Sql Server and Sybase tend to think of relational databases in a
>> certain way. If we go back to 1998, most RDBMS storage engine had a max row
>> size limit. Databases like Sybase before version 9 preferred RAW disk for
>> optimal performance. I can go on and on, but there's no point really.
>>
>> Cassandra's storage engine is "row oriented", but it's not relational in
>> RDBMS sense. We do everyone a huge disservice by using confusing
>> terminology and then making fun of those who get confused. No one wins when
>> that happens. At the end of the day, what differentiates cassandra's
>> storage engine is it support static and dynamic columns, which traditional
>> RDBMS don't support today. Calling Cassandra storage "distributed tables"
>> doesn't really help in my bias opinion.
>>
>> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
>> distributed tables" they might answer "so what, sql server and oracle can
>> do that too." The difference is with RDBMS the partitioning is optional and
>> requires more work to configure. Whereas with Cassandra you can have
>> everything in 1 node, which means there is only 1 partition and no
>> different to 1 instance of sql server. Where you win is when you need to
>> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
>> Oracle you have to do a little bit more work. I've lost count of how many
>> times I've to explained noSql databases to RDBMS admins and had to explain
>> the official docs are stupid.
>>
>>
>>
>> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <edlinuxguru@gmail.com
>> <javascript:_e(%7B%7D,'cvml','edlinuxguru@gmail.com');>> wrote:
>>
>>> https://github.com/apache/cassandra
>>>
>>> Row store <http://wiki.apache.org/cassandra/DataModel> means that like
>>> relational databases, Cassandra organizes data by rows and columns. The
>>> Cassandra Query Language (CQL) is a close relative of SQL.
>>>
>>> I generally do not know what to say about these high level
>>> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
>>> or do they mean IP routers with layer 4 packet inspections and layer 3
>>> Access Control Lists?
>>>
>>> We say (and I catch myself doing it all the time) "like relational
>>> databases" often as if all relational databases work alike. A columnar
>>> store like HP Vertica is a relational database.MySql has different storage
>>> engines does MyIsam work like InnoDB?
>>>
>>> Google docs organizes data by rows and columns as well. You can wrap any
>>> storage system into an API that makes them look like rows and columns.
>>> Microsoft LINQ can enumerate your network cars and query them
>>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really
>>> does not make your network cards a "row store"
>>>
>>> "Theoretically a row can have 2 billion columns, but in practice it
>>> shouldn't have more than 100 million columns."
>>> In practice (In my experience) the number is much lower than 100
>>> million, and if the data actually is deleted and readded frequently the
>>> number of live columns(rows, whatever) you can use happily is even lower
>>>
>>>
>>> I believe on twitter (I am unable to find the tweet) someone was trying
>>> to convince me Cassandra was a "columnar analytic database".  ROFL
>>>
>>> I believe telling someone it "row store" "like a database", is not a
>>> good idea. They might away content with that explanation. You are setting
>>> them up to walk into an anti-pattern. Like a case where the user is
>>> attempting to write and deleting 1 row and 1 column 6 billion times a day.
>>> Then you end up explaining to them http://stackoverflow.com/
>>> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>>>
>>> and how the cassandra storage model is not "like a relational database".
>>>
>>> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <edlinuxguru@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','edlinuxguru@gmail.com');>> wrote:
>>>
>>>> I can iterate over JSON data stored in mongo and present it as a table
>>>> with rows and columns. It does not make mongo a rowstore.
>>>>
>>>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxguru@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','edlinuxguru@gmail.com');>> wrote:
>>>>
>>>>> The problem with calling it a row store:
>>>>>
>>>>> https://en.wikipedia.org/wiki/Row_(database)
>>>>>
>>>>> In the context of a relational database
>>>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also
>>>>> called a record
>>>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
>>>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly
>>>>> structured data <https://en.wikipedia.org/wiki/Data> item in a
table
>>>>> <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms,
a
>>>>> database table can be thought of as consisting of *rows* andcolumns
>>>>> <https://en.wikipedia.org/wiki/Column_(database)> or fields
>>>>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>>>>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each
row
>>>>> in a table represents a set of related data, and every row in the table
has
>>>>> the same structure.
>>>>>
>>>>> When you have static columns and rows with maps, and lists, it is hard
>>>>> to argue that every row has the same structure. Physically at the storage
>>>>> layer they do not have the same structure and logically when accessing
the
>>>>> data they barely have the same structure, as the static column is just
>>>>> appearing inside each row it is actually not contained in.
>>>>>
>>>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jon@jonhaddad.com
>>>>> <javascript:_e(%7B%7D,'cvml','jon@jonhaddad.com');>> wrote:
>>>>>
>>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>>> store" which usually needs some extra explanation but is more accurate
than
>>>>>> "column family" or whatever other thrift era terminology people still
use.
>>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduyhai@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','doanduyhai@gmail.com');>>
wrote:
>>>>>>
>>>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>>>> table. This definition is closer to CQL and has some academic
background
>>>>>>> (distributed hash table).
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>>>> benedict@apache.org
>>>>>>> <javascript:_e(%7B%7D,'cvml','benedict@apache.org');>>
wrote:
>>>>>>>
>>>>>>>> Cassandra is not a "wide column store" anymore.  It has a
schema.
>>>>>>>> Only thrift users no longer think they have a schema (though
they do), and
>>>>>>>> thrift is being deprecated.
>>>>>>>>
>>>>>>>> I really wish everyone would kill the term "wide column store"
with
>>>>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>>>>> row-oriented", and a "column store" means literally the opposite
of this.
>>>>>>>>
>>>>>>>> Not only that, but people don't even seem to realise the
term
>>>>>>>> "column store" existed long before "wide column store" and
the latter is
>>>>>>>> often abbreviated to the former, as here:
>>>>>>>> http://www.planetcassandra.org/what-is-nosql/
>>>>>>>>
>>>>>>>> Since it no longer applies, let's all agree as a community
to
>>>>>>>> forget this awful nomenclature ever existed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>>>>> joaquin@thelastpickle.com
>>>>>>>> <javascript:_e(%7B%7D,'cvml','joaquin@thelastpickle.com');>>
wrote:
>>>>>>>>
>>>>>>>>> Hi Mehdi,
>>>>>>>>>
>>>>>>>>> I can help clarify a few things.
>>>>>>>>>
>>>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically
a
>>>>>>>>> row can have 2 billion columns, but in practice it shouldn't
have more than
>>>>>>>>> 100 million columns.
>>>>>>>>>
>>>>>>>>> Cassandra partitions data to certain nodes based on the
partition
>>>>>>>>> key(s), but does provide the option of setting zero or
more clustering
>>>>>>>>> keys. Together, the partition key(s) and clustering key(s)
form the primary
>>>>>>>>> key.
>>>>>>>>>
>>>>>>>>> When writing to Cassandra, you will need to provide the
full
>>>>>>>>> primary key, however, when reading from Cassandra, you
only need to provide
>>>>>>>>> the full partition key.
>>>>>>>>>
>>>>>>>>> When you only provide the partition key for a read operation,
>>>>>>>>> you're able to return all columns that exist on that
partition with low
>>>>>>>>> latency. These columns are displayed as "CQL rows" to
make it easier to
>>>>>>>>> reason about.
>>>>>>>>>
>>>>>>>>> Consider the schema:
>>>>>>>>>
>>>>>>>>> CREATE TABLE foo (
>>>>>>>>>   bar uuid,
>>>>>>>>>
>>>>>>>>>   boz uuid,
>>>>>>>>>
>>>>>>>>>   baz timeuuid,
>>>>>>>>>   data1 text,
>>>>>>>>>
>>>>>>>>>   data2 text,
>>>>>>>>>
>>>>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>>>>
>>>>>>>>> );
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When you write to Cassandra you will need to send bar,
boz, and
>>>>>>>>> baz and optionally data*, if it's relevant for that CQL
row. If you chose
>>>>>>>>> not to define a data* field for a particular CQL row,
then nothing is
>>>>>>>>> stored nor allocated on disk. But I wouldn't consider
that caveat to be
>>>>>>>>> "schema-less".
>>>>>>>>>
>>>>>>>>> However, all writes to the same bar/boz will end up on
the same
>>>>>>>>> Cassandra replica set (a configurable number of nodes)
and be stored on the
>>>>>>>>> same place(s) on disk within the SSTable(s). And on disk,
each field that's
>>>>>>>>> not a partition key is stored as a column, including
clustering keys (this
>>>>>>>>> is optimized in Cassandra 3+, but now we're getting deep
into internals).
>>>>>>>>>
>>>>>>>>> In this way you can get fast responses for all activity
for
>>>>>>>>> bar/boz either over time, or for a specific time, with
roughly the same
>>>>>>>>> number of disk seeks, with varying lengths on the disk
scans.
>>>>>>>>>
>>>>>>>>> Hope that helps!
>>>>>>>>>
>>>>>>>>> Joaquin Casares
>>>>>>>>> Consultant
>>>>>>>>> Austin, TX
>>>>>>>>>
>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>
>>>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <
>>>>>>>>> info@mrcalonso.com
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','info@mrcalonso.com');>>
wrote:
>>>>>>>>>
>>>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>>>>> /system/Cassandra
>>>>>>>>>>
>>>>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>>>>> <https://twitter.com/calonso>
>>>>>>>>>>
>>>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','mehdi.bada@dbi-services.com');>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a theoritical question:
>>>>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>>>>> Column store mean storing the data as column
rather than as a
>>>>>>>>>>> rows.
>>>>>>>>>>>
>>>>>>>>>>> In fact C* store the data as row, and data is
partionned with
>>>>>>>>>>> row key.
>>>>>>>>>>>
>>>>>>>>>>> Finally, for me, Cassandra is a row oriented
schema less
>>>>>>>>>>> DBMS.... Is it true for you also???
>>>>>>>>>>>
>>>>>>>>>>> Many thanks in advance for your reply
>>>>>>>>>>>
>>>>>>>>>>> Best Regards
>>>>>>>>>>> Mehdi Bada
>>>>>>>>>>> ----
>>>>>>>>>>>
>>>>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928
75 48 | Fax: +41
>>>>>>>>>>> 32 422 96 15
>>>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','mehdi.bada@dbi-services.com');>
>>>>>>>>>>> www.dbi-services.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *⇒ dbi services is recruiting Oracle &
SQL Server experts ! –
>>>>>>>>>>> Join the team
>>>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message