incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: solandra or pig or....?
Date Wed, 22 Jun 2011 10:38:01 GMT
Well solandra is running Cassandra so you can use Cassandra as you do today, but index some
of the data in solr. 

On Jun 22, 2011, at 3:41 AM, Sasha Dolgy <sdolgy@gmail.com> wrote:

> First, thanks everyone for the input.  Appreciate it.  The number
> crunching would already have been completed, and all statistics per
> game defined, and inserted into the appropriate CF/row/cols ...
> 
> So, that being said, Solandra appears to be the right way to go ...
> except, this would require that my current application(s) be rewritten
> to consume Solandra and no longer Cassandra ... "Your application
> isn't aware of Cassandra only Solr." or can I have the best of both
> worlds?  Search is only one aspect of the consumer experience.  If a
> consumer wanted to view a 'card' for a baseball player, all the
> information would be retrieved directly from Cassandra to build that
> card and search wouldn't be required...
> 
> -sd
> 
> On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <jakers@gmail.com> wrote:
>> Right,  Solr will not do anything other than basic aggregations (facets) and
>> range queries.
>> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <dan.kuebrich@gmail.com>
>> wrote:
>>> 
>>> Solandra is indeed distributed search, not distributed number-crunching.
>>>  As a previous poster said, you could imagine structuring the data in a
>>> series of documents with fields containing playername, teamname, position,
>>> location, day, time, inning, at bat, outcome, etc.  Then you could query to
>>> get a slice of the data that matches your predicate and run statistics on
>>> that subset.
>>> The statistics would have to come from other code (eg. R), but solr will
>>> filter it for you. So, this approach only works if the slices are reasonably
>>> small, but gives you great granularity on search as long as you put all the
>>> info in.  The users of this datastore (or you) must be willing to write
>>> their own simple aggregation functions ("show me only the unique player
>>> names returned by this solr query", "show me the average of field X returned
>>> by this solr query", ...)
>>> If the numbers of results are too great, MR may be the way to go.

Mime
View raw message