incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <>
Subject Re: solandra or pig or....?
Date Tue, 21 Jun 2011 19:50:42 GMT
Right,  Solr will not do anything other than basic aggregations (facets) and
range queries.

On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <>wrote:

> Solandra is indeed distributed search, not distributed number-crunching.
>  As a previous poster said, you could imagine structuring the data in a
> series of documents with fields containing playername, teamname, position,
> location, day, time, inning, at bat, outcome, etc.  Then you could query to
> get a slice of the data that matches your predicate and run statistics on
> that subset.
> The statistics would have to come from other code (eg. R), but solr will
> filter it for you. So, this approach only works if the slices are reasonably
> small, but gives you great granularity on search as long as you put all the
> info in.  The users of this datastore (or you) must be willing to write
> their own simple aggregation functions ("show me only the unique player
> names returned by this solr query", "show me the average of field X returned
> by this solr query", ...)
> If the numbers of results are too great, MR may be the way to go.
> On Tue, Jun 21, 2011 at 3:04 PM, Victor K. <>wrote:
>> If I may ask Sasha, what exactly are you trying to achieve using SolR (or
>> Solandra, I guess it's about the same) ?
>> Because from what I understood of your problem you need to do statistics
>> on your matches, players etc... Or do you just want to retrieve information
>> that are already been computed ?
>> If it is the first thing you are trying to achieve (data aggregation,
>> statistics, etc...) SolR won't be of a big use because it is not meant to do
>> statistics. If you want to achieve the second then SolR is just the tool for
>> you.
>> On 6/21/2011 2:47 PM, Sasha Dolgy wrote:
>>> Without getting overly complicated and long winded ... are there
>>> practical references / examples I can review that demonstrate the
>>> cassandra/solandra benefits....i had a quick look at
it wasn't
>>> dead obvious to me....
>>> On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani<>  wrote:
>>>> Solandra can answer the question you used as an example and it's more of
>>>> a
>>>> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
>>>> minutes not seconds.
>>>> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy<>  wrote:
>>>>> Folks,
>>>>> Simple question ... Assuming my current use case is the ability to log
>>>>> lots of trivial and seemingly useless sports statistics ... I want a
>>>>> user to be able to query / compare .... For example:
>>>>> -->  Show me all baseball players in cheektowaga and ontario,
>>>>> california who have hit a grandslam on tuesdays where it was just a
>>>>> leap year.
>>>>> Each baseball player is represented by a single row in a CF:
>>>>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>>>> Game's are UUID's that are a reference to another row in the same CF
>>>>> that provides information about that game...
>>>>> location, final score, date (unix timestamp or ISO format) , and
>>>>> statitics which are represented as a new column timestamp:player_uuid
>>>>> I can use PIG, as I understand, to run a query to generate specific
>>>>> information about specific "things" and populate that data back into
>>>>> Cassandra in another CF ... similar to the hypothetical search
>>>>> the information is structured already, i assume PIG is the
>>>>> right tool for the job, but may not be ideal for a web application and
>>>>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>>>>> seconds for that query to generate, populate, and return to the
>>>>> user...?
>>>>> On the other hand, I have started to read about Solr / Solandra /
>>>>> Lucandra .... can this provide similar functionality or better ?  or
>>>>> is it more geared towards full text search and indexing ...
>>>>> I don't want to get into the habit of guessing what my potential users
>>>>> want to search for ... trying to think of ways to offload this to
>>>>> them.
>>>>> --
>>>>> Sasha Dolgy
>>>> --


View raw message