cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Kabdebon <victor.kabde...@gmail.com>
Subject Re: solandra or pig or....?
Date Tue, 21 Jun 2011 17:26:24 GMT
I can speak for what I know :

Pig I have taken only a quick look and maybe some guys from Twitter can
answer better than me on that particular program. Pig is not for "on demand"
queries: they are quite slow and as you said you extract relevant
information and append it to another CF where you can retrieve quickly the
statistics.

SolR is purely a search engine. It is not only text based but also time
based etc... To do statistics you need mathematical operations, statistics,
SolR won't provide that. It can do simple things in terms of statistics but
mostly it is a search engine.

Personally for what you are asking I would use Pig and stock that in CF. I
would update those CF regularly. For simple statistics you can generate them
with your favorite language or a specialized language such as R as long as
it concerns small sets.

Hope it helps,
Victor Kabdebon

2011/6/21 Sasha Dolgy <sdolgy@gmail.com>

> Folks,
>
> Simple question ... Assuming my current use case is the ability to log
> lots of trivial and seemingly useless sports statistics ... I want a
> user to be able to query / compare .... For example:
>
> --> Show me all baseball players in cheektowaga and ontario,
> california who have hit a grandslam on tuesdays where it was just a
> leap year.
>
> Each baseball player is represented by a single row in a CF:
>
> player_uuid, fullname, hometown, game1, game2, game3, game4
>
> Game's are UUID's that are a reference to another row in the same CF
> that provides information about that game...
>
> location, final score, date (unix timestamp or ISO format) , and
> statitics which are represented as a new column timestamp:player_uuid
>
> I can use PIG, as I understand, to run a query to generate specific
> information about specific "things" and populate that data back into
> Cassandra in another CF ... similar to the hypothetical search
> above....as the information is structured already, i assume PIG is the
> right tool for the job, but may not be ideal for a web application and
> enabling ad-hoc queries ... it could take anywhere from 2-....?
> seconds for that query to generate, populate, and return to the
> user...?
>
> On the other hand, I have started to read about Solr / Solandra /
> Lucandra .... can this provide similar functionality or better ?  or
> is it more geared towards full text search and indexing ...
>
> I don't want to get into the habit of guessing what my potential users
> want to search for ... trying to think of ways to offload this to
> them.
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>

Mime
View raw message