incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sasha Dolgy <>
Subject Re: solandra or pig or....?
Date Wed, 22 Jun 2011 07:41:35 GMT
First, thanks everyone for the input.  Appreciate it.  The number
crunching would already have been completed, and all statistics per
game defined, and inserted into the appropriate CF/row/cols ...

So, that being said, Solandra appears to be the right way to go ...
except, this would require that my current application(s) be rewritten
to consume Solandra and no longer Cassandra ... "Your application
isn't aware of Cassandra only Solr." or can I have the best of both
worlds?  Search is only one aspect of the consumer experience.  If a
consumer wanted to view a 'card' for a baseball player, all the
information would be retrieved directly from Cassandra to build that
card and search wouldn't be required...


On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <> wrote:
> Right,  Solr will not do anything other than basic aggregations (facets) and
> range queries.
> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <>
> wrote:
>> Solandra is indeed distributed search, not distributed number-crunching.
>>  As a previous poster said, you could imagine structuring the data in a
>> series of documents with fields containing playername, teamname, position,
>> location, day, time, inning, at bat, outcome, etc.  Then you could query to
>> get a slice of the data that matches your predicate and run statistics on
>> that subset.
>> The statistics would have to come from other code (eg. R), but solr will
>> filter it for you. So, this approach only works if the slices are reasonably
>> small, but gives you great granularity on search as long as you put all the
>> info in.  The users of this datastore (or you) must be willing to write
>> their own simple aggregation functions ("show me only the unique player
>> names returned by this solr query", "show me the average of field X returned
>> by this solr query", ...)
>> If the numbers of results are too great, MR may be the way to go.

View raw message