cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <mcclou...@gmail.com>
Subject Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS
Date Tue, 30 Aug 2011 18:30:08 GMT
Thanks Jeremy for your response. That gives me some encouragement, that I
might be on that right track.

I think I need to try out more stuff before coming to a conclusion on Brisk.

For Pig operations over Cassandra, I only could find
http://svn.apache.org/repos/asf/cassandra/trunk/contrib/pig. Are there any
other resource that you can point me to? There seems to be a lack of samples
on this subject.

On Tue, Aug 30, 2011 at 10:56 PM, Jeremy Hanna
<jeremy.hanna1234@gmail.com>wrote:

> FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to
> potentially move to Brisk because of the simplicity of operations there.
>
> Not sure what you mean about the true power of Hadoop.  In my mind the true
> power of Hadoop is the ability to parallelize jobs and send each task to
> where the data resides.  HDFS exists to enable that.  Brisk is just another
> HDFS compatible implementation.  If you're already storing your data in
> Cassandra and are looking to use Hadoop with it, then I would seriously
> consider using Brisk.
>
> That said, Cassandra with Hadoop works fine.
>
> On Aug 30, 2011, at 11:58 AM, Tharindu Mathew wrote:
>
> > Hi Eric,
> >
> > Thanks for your response.
> >
> > On Tue, Aug 30, 2011 at 5:35 PM, Eric Djatsa <djatsaedy@gmail.com>
> wrote:
> >
> >> Hi Tharindu, try having a look at Brisk(
> >> http://www.datastax.com/products/brisk) it integrates Hadoop with
> >> Cassandra and is shipped with Hive for SQL analysis. You can then
> install
> >> Sqoop(http://www.cloudera.com/downloads/sqoop/) on top of Hadoop in
> order
> >> to enable data import/export between Hadoop and MySQL.
> >> Does this sound ok to you ?
> >>
> >> These do sound ok. But I was looking at using something from Apache
> itself.
> >
> > Brisk sounds nice, but I feel that disregarding HDFS and totally
> switching
> > to Cassandra is not the right thing to do. Just my opinion there. I feel
> we
> > are not using the true power of Hadoop then.
> >
> > I feel Pig has more integration with Cassandra, so I might take a look
> > there.
> >
> > Whichever I choose, I will contribute the code back to the Apache
> projects I
> > use. Here's a sample data analysis I do with my language. Maybe, there is
> no
> > generic way to do what I want to do.
> >
> >
> >
> > <get name="NodeId">
> > <index name="ServerName" start="" end=""/>
> > <!--<index name="nodeId" start="AS" end="FB"/>-->
> > <!--<groupBy index="nodeId"/>-->
> > <granularity index="timeStamp" type="hour"/>
> > </get>
> >
> > <lookup name="Event"/>
> >
> > <aggregate>
> > <measure name="RequestCount" aggregationType="CUMULATIVE"/>
> > <measure name="ResponseCount" aggregationType="CUMULATIVE"/>
> > <measure name="MaximumResponseTime" aggregationType="AVG"/>
> > </aggregate>
> >
> > <put name="NodeResult" indexRow="allKeys"/>
> >
> > <log/>
> >
> > <get name="NodeResult">
> > <index name="ServerName" start="" end=""/>
> > <groupBy index="ServerName"/>
> > </get>
> >
> > <aggregate>
> > <measure name="RequestCount" aggregationType="CUMULATIVE"/>
> > <measure name="ResponseCount" aggregationType="CUMULATIVE"/>
> > <measure name="MaximumResponseTime" aggregationType="AVG"/>
> > </aggregate>
> >
> > <put name="NodeAccumilator" indexRow="allKeys"/>
> >
> > <log/>
> >
> >
> >> 2011/8/29 Tharindu Mathew <mccloud35@gmail.com>
> >>
> >>> Hi,
> >>>
> >>> I have an already running system where I define a simple data flow
> (using
> >>> a simple custom data flow language) and configure jobs to run against
> stored
> >>> data. I use quartz to schedule and run these jobs and the data exists
> on
> >>> various data stores (mainly Cassandra but some data exists in RDBMS
> like
> >>> mysql as well).
> >>>
> >>> Thinking about scalability and already existing support for standard
> data
> >>> flow languages in the form of Pig and HiveQL, I plan to move my system
> to
> >>> Hadoop.
> >>>
> >>> I've seen some efforts on the integration of Cassandra and Hadoop. I've
> >>> been reading up and still am contemplating on how to make this change.
> >>>
> >>> It would be great to hear the recommended approach of doing this on
> Hadoop
> >>> with the integration of Cassandra and other RDBMS. For example, a
> sample
> >>> task that already runs on the system is "once in every hour, get rows
> from
> >>> column family X, aggregate data in columns A, B and C and write back to
> >>> column family Y, and enter details of last aggregated row into a table
> in
> >>> mysql"
> >>>
> >>> Thanks in advance.
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Tharindu
> >>>
> >>
> >>
> >>
> >> --
> >> *Eric Djatsa Yota*
> >> *Double degree MsC Student in Computer Science Engineering and
> >> Communication Networks
> >> Télécom ParisTech (FRANCE) - Politecnico di Torino (ITALY)*
> >> *Intern at AMADEUS S.A.S Sophia Antipolis*
> >> djatsaedy@gmail.com
> >> *Tel : 0601791859*
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Tharindu
>
>


-- 
Regards,

Tharindu

Mime
View raw message