cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From francesco.tangari....@gmail.com
Subject Re: General questions about Cassandra
Date Sat, 18 Feb 2012 08:51:52 GMT
i suppose that he should buy http://shop.oreilly.com/product/0636920010852.do , to get an idea
of what cassandra can and what can't. that's my personal thinking.

--  
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Il giorno venerdì 17 febbraio 2012, alle ore 17.59, Chris Gerken ha scritto:  

> In response to an offline question…
>  
> There are two usage patterns for Cassandra column families, static and dynamic. With
both approaches you store objects of a given type into a column family.
>  
> With static usage the object type you're persisting has a single key and each row in
the column family maps to a single object. The value of an object's key is stored in the row
key and each of the object's properties is stored in a column whose name is the name of the
property and whose value is the property value. There are the same number of columns in a
row as there are non-null property values. This usage is very much like traditional relational
database usage.
>  
> With dynamic usage the object type to be persisted has two keys (I'll get to composite
keys in a bit). With this approach the value of an object's primary key is stored as a row
key and the entire object is stored in a single column whose name is the value of the object's
secondary key and whose value is the entire object (serialized into a ByteBuffer). This results
in persisting potentially many objects in a single row. All of those objects have the same
primary key and there are as many columns as there are objects with the same primary key.
An example of this approach is a time series column family in which each row holds weather
readings for a different city and each column in a row holds all of the weather observations
for that city at a certain time. The timestamp is used as a column name and an object holding
all the observations is serialized and stored in the corresponding column value.
>  
> Cassandra is a really powerful database, but it excels performance-wise with reading
and writing time series data stored using a dynamic column family.
>  
> There are variations of the above patterns. You can use composite types to define a row
key or column name that are made up of values of multiple keys, for example.
>  
> I gave a presentation on the topic of Cassandra patterns recently to the Austin Cassandra
Meetup. You can find my charts there in the archives or posted to my box at the linkedin site
below…. or contact me offline.
>  
> To bring this back to the original question. Asking for the ability to apply a Java method
to selected rows makes sense for static column families, but I think the more general need
is to be able to apply a Java method to selected persisted objects in a column family regardless
of static or dynamic usage. While I'm on my soapbox, I think this requirement applies to Pig
support as well.
>  
> thx
>  
> Chris Gerken
>  
> chrisgerken@mindspring.com (mailto:chrisgerken@mindspring.com)
> 512.587.5261
> http://www.linkedin.com/in/chgerken
>  
>  
>  
> On Feb 17, 2012, at 10:07 AM, Chris Gerken wrote:
>  
> > Don,
> >  
> > That's a good idea, but you have to be careful not to preclude the use of dynamic
column families (e.g. CF's with time series-like schemas) which is what Cassandra's best at.
The right approach is to build your own "ORM"/persistence layer (or generate one with some
tools) that can hide the API differences between static and dynamic CF's. Once you're there,
hadoop and Pig both come very close to what you're asking for.
> >  
> > In other words, you should be asking for a means to apply a Java method to selected
objects (not rows) that are persisted in a Cassandra column family.
> >  
> > thx
> >  
> > - Chris
> >  
> > Chris Gerken
> >  
> > chrisgerken@mindspring.com (mailto:chrisgerken@mindspring.com)
> > 512.587.5261
> > http://www.linkedin.com/in/chgerken
> >  
> >  
> >  
> > On Feb 17, 2012, at 9:35 AM, Don Smith wrote:
> >  
> > > Are there plans to build-in some sort of map-reduce framework into Cassandra
and CQL? It seems that users should be able to apply a Java method to selected rows in parallel
on the distributed Cassandra JVMs. I believe Solandra uses such an integration.
> > >  
> > > Don
> > > ________________________________________
> > > From: Alessio Cecchi [alessio@skye.it (mailto:alessio@skye.it)]
> > > Sent: Friday, February 17, 2012 4:42 AM
> > > To: user@cassandra.apache.org (mailto:user@cassandra.apache.org)
> > > Subject: General questions about Cassandra
> > >  
> > > Hi,
> > >  
> > > we have developed a software that store logs from mail servers in MySQL,
> > > but for huge enviroments we are developing a version that store this
> > > data in HBase. Raw logs are, once a day, first normalized, so the output
> > > is like this:
> > >  
> > > username,date of login, IP Address, protocol
> > > username,date of login, IP Address, protocol
> > > username,date of login, IP Address, protocol
> > > [...]
> > >  
> > > and after inserted into the database.
> > >  
> > > As I was saying, for huge installation (from 1 to 10 million of logins
> > > per day, keep for 12 months) we are working with HBase, but I would also
> > > consider Cassandra.
> > >  
> > > The advantage of HBase is MapReduce which makes searching the logs very
> > > fast by splitting the "query" concurrently on multiple hosts.
> > >  
> > > Query will be launched from a web interface (will be few requests per
> > > day) and the search keys are user and time range.
> > >  
> > > But Cassandra seems less complex to manage and simply to run, so I want
> > > to evaluate it instead of HBase.
> > >  
> > > My question is, can also Cassandra split a "query" over the cluster like
> > > MapReduce? Reading on-line Cassandra seems fast in insert data but
> > > slower than HBase to "query". Is it really so?
> > >  
> > > We want not install Hadoop over Cassandra.
> > >  
> > > Any suggestion is welcome :-)
> > >  
> > > --
> > > Alessio Cecchi is:
> > > @ ILS -> http://www.linux.it/~alessice/
> > > on LinkedIn -> http://www.linkedin.com/in/alessice
> > > Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/
> > > @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it
> > > @ LOLUG -> Socio http://www.lolug.net
> > >  
> >  
> >  
>  
>  
>  



Mime
View raw message