cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott w <scottbl...@gmail.com>
Subject Re: Advise for choice
Date Fri, 08 Jan 2010 19:11:31 GMT
Good point although there has been very recent work integrating solr with
katta so you can have your cake and eat it too:

http://developer.yahoo.net/blogs/theater/archives/2009/12/hadoop_bay_area_user_group_session_1.html


On Fri, Jan 8, 2010 at 1:09 AM, Erich Nachbar <erich@nachbar.biz> wrote:

> I can give you a few more data points. For one of my last projects, I
> built the search index of one of the largest IM aggregators. I got
> around 2.5k chat msg/s, keeping 400M messages in my index.
>
> I looked at Solr and while it is very convenient/luxurious, there was
> no way in hell I could scale it this big. I ended up using Katta to
> serve the index with Hadoop to compute my index shards.
>
> While the whole system is batch oriented, I got my latency down to
> 2min (time for a doc to show up in the index), if I got less than 8k
> chat messages/s in.
>
> Katta handles replication and node failover (uses Zookeeper) and can
> be scaled easily by adding nodes & increasing the replication factor.
> In comparison to Solr, scale was not one of the things I had to worry.
>
> Like others have said, unless you provide a lot more specifics it will
> be hard to give you detailed recommendations.
>
> Hope this help!
> -Erich
>
> On Thu, Jan 7, 2010 at 11:31 PM, Richard Grossman <richiesgr@gmail.com>
> wrote:
> > First Thanks to all your answer it's help to really check  all the
> aspects.
> >
> > In fact the system we want to build have to manage a lot of data but not
> in
> > an heavy transactional way. Solr can handle the data but doesn't have
> > the distributed way to serve it. But it's always possible to just
> duplicate
> > the data in my case. then we can load balancing the queries between
> multiple
> > instance server.
> >
> > We load a large set of data once a week and that all this data are going
> to
> > be used as his without modification or update or delete. In this point
> load
> > the data into Solr is very easy because we make a csv file and that's it
> > it's inside.
> >
> > The data need to be structured but not like a relational
> database. Obviously
> > Solr doesn't fit the data structure required. it force us
> to de-normalize a
> > lot of data and build like a very very big table it's force us also to
> build
> > very difficult lucene query.
> >
> > The speed to query for data is critical cause the application is internet
> > oriented we hope a lot of queries / minutes. With this point the problem
> is
> > that with the same amount of data Solr have been faster than cassandra
> but
> > of course the data structure is not the same.
> >
> > It seems by the end we'll go as Tatu tell to have an hybrid solution
> mixing
> > Solr and Cassandra. I'm not sure its the best in our case
> > Thanks
>

Mime
View raw message