cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From selcuk mart <ad...@hostingdevi.com>
Subject Re: Cassandra data model right definition
Date Fri, 14 Oct 2016 13:44:33 GMT
unsubscribe


3.10.2016 16:25 tarihinde Edward Capriolo yazdı:
> The phrase is defensible, but that is the root of the problem. Take 
> for example a skateboard.
>
> "A skateboard is like a bike because it has wheels and you ride on it."
>
> That is true and defensively true. :) However with not much more text 
> you can accurately describe what it is, as opposed to something it is 
> almost like.
>
> "A skateboard is a thin piece of wood on top of four small wheels that 
> you stand on and ride"
>
> The old sentence Cassandra statement was something to the effect of 
> "with the storage model of big table and the consistency model of 
> dynamo". This accurately described the system and gave reference to 
> specific known quantities (bigtable/dynamo) in which white papers 
> existed for further reading.
>
> On Mon, Oct 3, 2016 at 6:24 AM, Benedict Elliott Smith 
> <benedict@apache.org <mailto:benedict@apache.org>> wrote:
>
>     While that sentence leaves a lot to be desired (for me because it
>     confers a different meaning on row store), it doesn't say
>     "Cassandra is like a RDBMS" - it says "like an RDBMS, it organises
>     data by rows and columns" - i.e., in this regard only it is like
>     an RDBMS, not more generally.
>
>     I believe it was meant to help people, especially those afraid of
>     the NoSQL thrift world, understand that it still uses the basic
>     concept of a rows and columns they are used to.  I agree it could
>     be improved to minimise the chance of misreading it, and I'm
>     certain contributions would be welcome here.
>
>     I don't personally want to get bogged down in analysing every
>     piece of text anyone has ever written, so I'll bow out of further
>     discussion on this.  These phrases may all be suboptimal, but they
>     are certainly defensible.  Column store is not, that's all I
>     wanted to contribute here.
>
>
>
>
>
>     On 1 October 2016 at 19:35, Peter Lin <woolfel@gmail.com
>     <mailto:woolfel@gmail.com>> wrote:
>
>         I'll second Ed's comment.
>
>         The documentation should be more careful when using phrases
>         "like relational databases". When we look at the history of
>         relational databases, people expect certain things like ACID
>         transactions, primary/foriegn key constraints, query planners,
>         joins and relational algebra. Clearly Cassandra's storage
>         engine does not follow most of those principals for a good reason.
>
>         The term row oriented storage would be more descriptive and
>         appropriate. It avoids conflating Cassandra storage engine
>         with "traditional" relational storage engines. Those of us
>         that have spent over a decade using IBM DB2, Oracle, Sql
>         Server and Sybase tend to think of relational databases in a
>         certain way. If we go back to 1998, most RDBMS storage engine
>         had a max row size limit. Databases like Sybase before version
>         9 preferred RAW disk for optimal performance. I can go on and
>         on, but there's no point really.
>
>         Cassandra's storage engine is "row oriented", but it's not
>         relational in RDBMS sense. We do everyone a huge disservice by
>         using confusing terminology and then making fun of those who
>         get confused. No one wins when that happens. At the end of the
>         day, what differentiates cassandra's storage engine is it
>         support static and dynamic columns, which traditional RDBMS
>         don't support today. Calling Cassandra storage "distributed
>         tables" doesn't really help in my bias opinion.
>
>         For example, if you tell a SqlServer or Oracle RAC admin
>         "cassandra uses distributed tables" they might answer "so
>         what, sql server and oracle can do that too." The difference
>         is with RDBMS the partitioning is optional and requires more
>         work to configure. Whereas with Cassandra you can have
>         everything in 1 node, which means there is only 1 partition
>         and no different to 1 instance of sql server. Where you win is
>         when you need to add 2 more nodes, Cassandra makes this easier
>         whereas with SqlServer and Oracle you have to do a little bit
>         more work. I've lost count of how many times I've to explained
>         noSql databases to RDBMS admins and had to explain the
>         official docs are stupid.
>
>
>
>         On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo
>         <edlinuxguru@gmail.com <mailto:edlinuxguru@gmail.com>> wrote:
>
>             https://github.com/apache/cassandra
>             <https://github.com/apache/cassandra>
>
>             Row store
>             <http://wiki.apache.org/cassandra/DataModel> means that
>             like relational databases, Cassandra organizes data by
>             rows and columns. The Cassandra Query Language (CQL) is a
>             close relative of SQL.
>
>             I generally do not know what to say about these high level
>             "oversimplifications" like "firewalls block hackers". Are
>             there "firewalls" or do they mean IP routers with layer 4
>             packet inspections and layer 3 Access Control Lists?
>
>             We say (and I catch myself doing it all the time) "like
>             relational databases" often as if all relational databases
>             work alike. A columnar store like HP Vertica is a
>             relational database.MySql has different storage engines
>             does MyIsam work like InnoDB?
>
>             Google docs organizes data by rows and columns as well.
>             You can wrap any storage system into an API that makes
>             them look like rows and columns. Microsoft LINQ can
>             enumerate your network cars and query them
>             https://msdn.microsoft.com/en-us/library/bb308959.aspx
>             <https://msdn.microsoft.com/en-us/library/bb308959.aspx> ,
>             that really does not make your network cards a "row store"
>
>             "Theoretically a row can have 2 billion columns, but in
>             practice it shouldn't have more than 100 million columns."
>             In practice (In my experience) the number is much lower
>             than 100 million, and if the data actually is deleted and
>             readded frequently the number of live columns(rows,
>             whatever) you can use happily is even lower
>
>
>             I believe on twitter (I am unable to find the tweet)
>             someone was trying to convince me Cassandra was a
>             "columnar analytic database".  ROFL
>
>             I believe telling someone it "row store" "like a
>             database", is not a good idea. They might away content
>             with that explanation. You are setting them up to walk
>             into an anti-pattern. Like a case where the user is
>             attempting to write and deleting 1 row and 1 column 6
>             billion times a day. Then you end up explaining to them
>             http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>             <http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached>
>
>
>             and how the cassandra storage model is not "like a
>             relational database".
>
>             On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo
>             <edlinuxguru@gmail.com <mailto:edlinuxguru@gmail.com>> wrote:
>
>                 I can iterate over JSON data stored in mongo and
>                 present it as a table with rows and columns. It does
>                 not make mongo a rowstore.
>
>                 On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo
>                 <edlinuxguru@gmail.com <mailto:edlinuxguru@gmail.com>>
>                 wrote:
>
>                     The problem with calling it a row store:
>
>                     https://en.wikipedia.org/wiki/Row_(database)
>                     <https://en.wikipedia.org/wiki/Row_%28database%29>
>
>                     In the context of a relational database
>                     <https://en.wikipedia.org/wiki/Relational_database>,
>                     a *row*—also called a record
>                     <https://en.wikipedia.org/wiki/Record_%28computer_science%29>
or
>                     tuple
>                     <https://en.wikipedia.org/wiki/Tuple>—represents a
>                     single, implicitly structured data
>                     <https://en.wikipedia.org/wiki/Data> item in a
>                     table
>                     <https://en.wikipedia.org/wiki/Table_%28database%29>.
>                     In simple terms, a database table can be thought
>                     of as consisting of /rows/ andcolumns
>                     <https://en.wikipedia.org/wiki/Column_%28database%29> or
>                     fields
>                     <https://en.wikipedia.org/wiki/Field_%28computer_science%29>.^[1]
>                     <https://en.wikipedia.org/wiki/Row_%28database%29#cite_note-1>
>                      Each row in a table represents a set of related
>                     data, and every row in the table has the same
>                     structure.
>
>                     When you have static columns and rows with maps,
>                     and lists, it is hard to argue that every row has
>                     the same structure. Physically at the storage
>                     layer they do not have the same structure and
>                     logically when accessing the data they barely have
>                     the same structure, as the static column is just
>                     appearing inside each row it is actually not
>                     contained in.
>
>                     On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad
>                     <jon@jonhaddad.com <mailto:jon@jonhaddad.com>> wrote:
>
>                         +1000 to what Benedict says. I usually call it
>                         a "partitioned row store" which usually needs
>                         some extra explanation but is more accurate
>                         than "column family" or whatever other thrift
>                         era terminology people still use.
>                         On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan
>                         <doanduyhai@gmail.com
>                         <mailto:doanduyhai@gmail.com>> wrote:
>
>                             I used to present Cassandra as a NoSQL
>                             datastore with "distributed" table. This
>                             definition is closer to CQL and has some
>                             academic background (distributed hash table).
>
>
>                             On Fri, Sep 30, 2016 at 7:43 PM, Benedict
>                             Elliott Smith <benedict@apache.org
>                             <mailto:benedict@apache.org>> wrote:
>
>                                 Cassandra is not a "wide column store"
>                                 anymore.  It has a schema. Only thrift
>                                 users no longer think they have a
>                                 schema (though they do), and thrift is
>                                 being deprecated.
>
>                                 I really wish everyone would kill the
>                                 term "wide column store" with fire. 
>                                 It seems to have never meant anything
>                                 beyond "schema-less, row-oriented",
>                                 and a "column store" means literally
>                                 the opposite of this.
>
>                                 Not only that, but people don't even
>                                 seem to realise the term "column
>                                 store" existed long before "wide
>                                 column store" and the latter is often
>                                 abbreviated to the former, as here:
>                                 http://www.planetcassandra.org/what-is-nosql/
>                                 <http://www.planetcassandra.org/what-is-nosql/>
>
>
>                                 Since it no longer applies, let's all
>                                 agree as a community to forget this
>                                 awful nomenclature ever existed.
>
>
>
>                                 On 30 September 2016 at 18:09, Joaquin
>                                 Casares <joaquin@thelastpickle.com
>                                 <mailto:joaquin@thelastpickle.com>> wrote:
>
>                                     Hi Mehdi,
>
>                                     I can help clarify a few things.
>
>                                     As Carlos said, Cassandra is a
>                                     Wide Column Store. Theoretically a
>                                     row can have 2 billion columns,
>                                     but in practice it shouldn't have
>                                     more than 100 million columns.
>
>                                     Cassandra partitions data to
>                                     certain nodes based on the
>                                     partition key(s), but does provide
>                                     the option of setting zero or more
>                                     clustering keys. Together,
>                                     the partition key(s) and
>                                     clustering key(s) form the primary
>                                     key.
>
>                                     When writing to Cassandra, you
>                                     will need to provide the full
>                                     primary key, however, when reading
>                                     from Cassandra, you only need to
>                                     provide the full partition key.
>
>                                     When you only provide the
>                                     partition key for a read
>                                     operation, you're able to return
>                                     all columns that exist on that
>                                     partition with low latency. These
>                                     columns are displayed as "CQL
>                                     rows" to make it easier to reason
>                                     about.
>
>                                     Consider the schema:
>
>                                         CREATE TABLE foo (
>                                           bar uuid,
>
>                                           boz uuid,
>
>                                           baz timeuuid,
>                                           data1 text,
>
>                                           data2 text,
>
>                                           PRIMARY KEY ((bar, boz), baz)
>
>                                         );
>
>
>                                     When you write to Cassandra you
>                                     will need to send bar, boz, and
>                                     baz and optionally data*, if it's
>                                     relevant for that CQL row. If you
>                                     chose not to define a data* field
>                                     for a particular CQL row, then
>                                     nothing is stored nor allocated on
>                                     disk. But I wouldn't consider that
>                                     caveat to be "schema-less".
>
>                                     However, all writes to the same
>                                     bar/boz will end up on the same
>                                     Cassandra replica set (a
>                                     configurable number of nodes) and
>                                     be stored on the same place(s) on
>                                     disk within the SSTable(s). And on
>                                     disk, each field that's not a
>                                     partition key is stored as a
>                                     column, including clustering keys
>                                     (this is optimized in Cassandra
>                                     3+, but now we're getting deep
>                                     into internals).
>
>                                     In this way you can get fast
>                                     responses for all activity for
>                                     bar/boz either over time, or for a
>                                     specific time, with roughly the
>                                     same number of disk seeks, with
>                                     varying lengths on the disk scans.
>
>                                     Hope that helps!
>
>                                     Joaquin Casares
>                                     Consultant
>                                     Austin, TX
>
>                                     Apache Cassandra Consulting
>                                     http://www.thelastpickle.com
>
>                                     On Fri, Sep 30, 2016 at 11:40 AM,
>                                     Carlos Alonso <info@mrcalonso.com
>                                     <mailto:info@mrcalonso.com>> wrote:
>
>                                         Cassandra is a Wide Column
>                                         Store
>                                         http://db-engines.com/en/system/Cassandra
>                                         <http://db-engines.com/en/system/Cassandra>
>
>                                         Carlos Alonso | Software
>                                         Engineer | @calonso
>                                         <https://twitter.com/calonso>
>
>                                         On 30 September 2016 at 18:24,
>                                         Mehdi Bada
>                                         <mehdi.bada@dbi-services.com
>                                         <mailto:mehdi.bada@dbi-services.com>>
>                                         wrote:
>
>                                             Hi all,
>
>                                             I have a theoritical
>                                             question:
>                                             - Is Apache Cassandra
>                                             really a column store?
>                                             Column store mean storing
>                                             the data as column rather
>                                             than as a rows.
>
>                                             In fact C* store the data
>                                             as row, and data is
>                                             partionned with row key.
>
>                                             Finally, for me, Cassandra
>                                             is a row oriented schema
>                                             less DBMS.... Is it true
>                                             for you also???
>
>                                             Many thanks in advance for
>                                             your reply
>
>                                             Best Regards
>                                             Mehdi Bada
>                                             ----
>
>                                             *Mehdi Bada* | Consultant
>                                             Phone: +41 32 422 96 00
>                                             <tel:%2B41%2032%20422%2096%2000>
>                                             | Mobile: +41 79 928 75 48
>                                             <tel:%2B41%2079%20928%2075%2048>
>                                             | Fax: +41 32 422 96 15
>                                             <tel:%2B41%2032%20422%2096%2015>
>
>                                             dbi services, Rue de la
>                                             Jeunesse 2, CH-2800 Delémont
>                                             mehdi.bada@dbi-services.com
>                                             <mailto:mehdi.bada@dbi-services.com>
>
>                                             www.dbi-services.com
>                                             <http://www.dbi-services.com>
>
>
>
>                                             *⇒ dbi services is
>                                             recruiting Oracle & SQL
>                                             Server experts ! – Join
>                                             the team
>                                             <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>
>                                             *
>
>
>
>
>
>
>
>
>
>
>

-- 
İyi Çalışmalar
Selçuk MART
ONLINE KURUM
Hacettepe Üniversitesi Teknokent,
Üniversiteliler Mah. 1596. Sok.
Safir Blokları, E BLOK 802/A,
Beytepe, Çankaya/ANKARA
Tel: +90 (312) 227 000 5


Mime
View raw message