incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Basulto <santiago.basu...@gmail.com>
Subject Re: solandra or pig or....?
Date Wed, 22 Jun 2011 10:47:41 GMT
Wouldn't it be useful to store your data somewhere structured
(Cassandra is obviously an option) and then use MapReduce to store
statistics?


2011/6/22 Jake Luciani <jakers@gmail.com>:
> Well solandra is running Cassandra so you can use Cassandra as you do today, but index
some of the data in solr.
>
> On Jun 22, 2011, at 3:41 AM, Sasha Dolgy <sdolgy@gmail.com> wrote:
>
>> First, thanks everyone for the input.  Appreciate it.  The number
>> crunching would already have been completed, and all statistics per
>> game defined, and inserted into the appropriate CF/row/cols ...
>>
>> So, that being said, Solandra appears to be the right way to go ...
>> except, this would require that my current application(s) be rewritten
>> to consume Solandra and no longer Cassandra ... "Your application
>> isn't aware of Cassandra only Solr." or can I have the best of both
>> worlds?  Search is only one aspect of the consumer experience.  If a
>> consumer wanted to view a 'card' for a baseball player, all the
>> information would be retrieved directly from Cassandra to build that
>> card and search wouldn't be required...
>>
>> -sd
>>
>> On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <jakers@gmail.com> wrote:
>>> Right,  Solr will not do anything other than basic aggregations (facets) and
>>> range queries.
>>> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <dan.kuebrich@gmail.com>
>>> wrote:
>>>>
>>>> Solandra is indeed distributed search, not distributed number-crunching.
>>>>  As a previous poster said, you could imagine structuring the data in a
>>>> series of documents with fields containing playername, teamname, position,
>>>> location, day, time, inning, at bat, outcome, etc.  Then you could query
to
>>>> get a slice of the data that matches your predicate and run statistics on
>>>> that subset.
>>>> The statistics would have to come from other code (eg. R), but solr will
>>>> filter it for you. So, this approach only works if the slices are reasonably
>>>> small, but gives you great granularity on search as long as you put all the
>>>> info in.  The users of this datastore (or you) must be willing to write
>>>> their own simple aggregation functions ("show me only the unique player
>>>> names returned by this solr query", "show me the average of field X returned
>>>> by this solr query", ...)
>>>> If the numbers of results are too great, MR may be the way to go.
>



-- 
Santiago Basulto.-

Mime
View raw message