Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9EA44F2C for ; Tue, 21 Jun 2011 17:27:14 +0000 (UTC) Received: (qmail 34848 invoked by uid 500); 21 Jun 2011 17:27:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34819 invoked by uid 500); 21 Jun 2011 17:27:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34811 invoked by uid 99); 21 Jun 2011 17:27:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 17:27:12 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of victor.kabdebon@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 17:27:04 +0000 Received: by fxm15 with SMTP id 15so98451fxm.31 for ; Tue, 21 Jun 2011 10:26:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=sWpS8SrY+SYyxuZPHbCQfBMtyI6aXgPMbVqLRdTZhS4=; b=Cq9XH7cKlHJBGPqStEbh9yPuktj5AZxUuhJn/sge3iVFz0KgKQSB5zz0O2sb0UBKmE Ydluuji6kuIlFREKs6qgMmNzNgAqspYUU2HEabAr+ktbc/iEILLpTL2aB4/Zvb0ggZVb bJUhL8RrXvO+TwsPk1Vx9Mtd21w1bxOt3hQO0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=PCINv/6tfzskM0yHtdFqezv6RulTlK4bGeWa6ca+Z4UT2dxMSIfF1CN1lcEfaapzmd vLa/B9j6xYCTN2YEvwNKRN3d3mkhVpsjXq13EK1y5QSD5LMzNrQxq4fgS/Yn0V7H2YDs PbwTzLYamHTLs1+UYhknU4WcjJYB8bznRF7Go= Received: by 10.223.97.196 with SMTP id m4mr1780762fan.55.1308677204096; Tue, 21 Jun 2011 10:26:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.14.148 with HTTP; Tue, 21 Jun 2011 10:26:24 -0700 (PDT) In-Reply-To: References: From: Victor Kabdebon Date: Tue, 21 Jun 2011 13:26:24 -0400 Message-ID: Subject: Re: solandra or pig or....? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001517491d54e035fc04a63c27fc X-Virus-Checked: Checked by ClamAV on apache.org --001517491d54e035fc04a63c27fc Content-Type: text/plain; charset=ISO-8859-1 I can speak for what I know : Pig I have taken only a quick look and maybe some guys from Twitter can answer better than me on that particular program. Pig is not for "on demand" queries: they are quite slow and as you said you extract relevant information and append it to another CF where you can retrieve quickly the statistics. SolR is purely a search engine. It is not only text based but also time based etc... To do statistics you need mathematical operations, statistics, SolR won't provide that. It can do simple things in terms of statistics but mostly it is a search engine. Personally for what you are asking I would use Pig and stock that in CF. I would update those CF regularly. For simple statistics you can generate them with your favorite language or a specialized language such as R as long as it concerns small sets. Hope it helps, Victor Kabdebon 2011/6/21 Sasha Dolgy > Folks, > > Simple question ... Assuming my current use case is the ability to log > lots of trivial and seemingly useless sports statistics ... I want a > user to be able to query / compare .... For example: > > --> Show me all baseball players in cheektowaga and ontario, > california who have hit a grandslam on tuesdays where it was just a > leap year. > > Each baseball player is represented by a single row in a CF: > > player_uuid, fullname, hometown, game1, game2, game3, game4 > > Game's are UUID's that are a reference to another row in the same CF > that provides information about that game... > > location, final score, date (unix timestamp or ISO format) , and > statitics which are represented as a new column timestamp:player_uuid > > I can use PIG, as I understand, to run a query to generate specific > information about specific "things" and populate that data back into > Cassandra in another CF ... similar to the hypothetical search > above....as the information is structured already, i assume PIG is the > right tool for the job, but may not be ideal for a web application and > enabling ad-hoc queries ... it could take anywhere from 2-....? > seconds for that query to generate, populate, and return to the > user...? > > On the other hand, I have started to read about Solr / Solandra / > Lucandra .... can this provide similar functionality or better ? or > is it more geared towards full text search and indexing ... > > I don't want to get into the habit of guessing what my potential users > want to search for ... trying to think of ways to offload this to > them. > > > > -- > Sasha Dolgy > sasha.dolgy@gmail.com > --001517491d54e035fc04a63c27fc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I can speak for what I know :

Pig I have taken only a qu= ick look and maybe some guys from Twitter can answer better than me on that= particular program. Pig is not for "on demand" queries: they are= quite slow and as you said you extract relevant information and append it = to another CF where you can retrieve quickly the statistics.

SolR is purely a search engine. It is not only text bas= ed but also time based etc... To do statistics you need mathematical operat= ions, statistics, SolR won't provide that. It can do simple things in t= erms of statistics but mostly it is a search engine.

Personally for what you are asking I would use Pig and stock that = in CF. I would update those CF regularly. For simple statistics you can gen= erate them with your favorite language or a specialized language such as R = as long as it concerns small sets.

Hope it helps,
Victor Kabdebon

=
2011/6/21 Sasha Dolgy <sdolgy@gmail.com>
Folks,

Simple question ... Assuming my current use case is the ability to log
lots of trivial and seemingly useless sports statistics ... I want a
user to be able to query / compare .... For example:

--> Show me all baseball players in cheektowaga and ontario,
california who have hit a grandslam on tuesdays where it was just a
leap year.

Each baseball player is represented by a single row in a CF:

player_uuid, fullname, hometown, game1, game2, game3, game4

Game's are UUID's that are a reference to another row in the same C= F
that provides information about that game...

location, final score, date (unix timestamp or ISO format) , and
statitics which are represented as a new column timestamp:player_uuid

I can use PIG, as I understand, to run a query to generate specific
information about specific "things" and populate that data back i= nto
Cassandra in another CF ... similar to the hypothetical search
above....as the information is structured already, i assume PIG is the
right tool for the job, but may not be ideal for a web application and
enabling ad-hoc queries ... it could take anywhere from 2-....?
seconds for that query to generate, populate, and return to the
user...?

On the other hand, I have started to read about Solr / Solandra /
Lucandra .... can this provide similar functionality or better ? =A0or
is it more geared towards full text search and indexing ...

I don't want to get into the habit of guessing what my potential users<= br> want to search for ... trying to think of ways to offload this to
them.



--
Sasha Dolgy
sasha.dolgy@gmail.com

--001517491d54e035fc04a63c27fc--