Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72796411E for ; Tue, 21 Jun 2011 18:20:01 +0000 (UTC) Received: (qmail 71009 invoked by uid 500); 21 Jun 2011 18:19:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70987 invoked by uid 500); 21 Jun 2011 18:19:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70979 invoked by uid 99); 21 Jun 2011 18:19:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 18:19:58 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jakers@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 18:19:52 +0000 Received: by gwb20 with SMTP id 20so13654gwb.31 for ; Tue, 21 Jun 2011 11:19:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=lkhHUOJ4wzC9p06uwKGecz56eCFluyBGkqe5DbNqmE8=; b=CGDtSq7dl4oFc2NJbugM6rpcEkG8NNUKCI1pIfKAnmyouqrT91APY0Xo3h1VzAto32 KQwmgnVO5lQ/nze1MHmpnjUSEY8aN2XZ0pc0YI5bih33uiUqCAULX1lO36xNdsQu0aPl 0twUGuUm68flPT6UX3AohOaC3EafBtp/NugOw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=qeObMrK9+1EDccf+5C+pGuYhBLvYKoMvAQ08SPpofgWXUqzXYuej/Lw89WSSAThrFQ wl9vhgAIb082DGhFhvAYhstp38C8mrv4JpC5rf77cAtyhxXCY1iH+S8oBpTRgI3obELS tTGUN/2lwh0+RrKLeXmyj9aJ4rfQZHW4yDHlg= Received: by 10.236.189.35 with SMTP id b23mr415979yhn.21.1308680371057; Tue, 21 Jun 2011 11:19:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.107.135 with HTTP; Tue, 21 Jun 2011 11:19:11 -0700 (PDT) In-Reply-To: References: From: Jake Luciani Date: Tue, 21 Jun 2011 14:19:11 -0400 Message-ID: Subject: Re: solandra or pig or....? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf303f6adca4337604a63ce447 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303f6adca4337604a63ce447 Content-Type: text/plain; charset=ISO-8859-1 Solandra can answer the question you used as an example and it's more of a fit for low-latency ad-hoc reporting then PIG. Pig queries will take minutes not seconds. On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy wrote: > Folks, > > Simple question ... Assuming my current use case is the ability to log > lots of trivial and seemingly useless sports statistics ... I want a > user to be able to query / compare .... For example: > > --> Show me all baseball players in cheektowaga and ontario, > california who have hit a grandslam on tuesdays where it was just a > leap year. > > Each baseball player is represented by a single row in a CF: > > player_uuid, fullname, hometown, game1, game2, game3, game4 > > Game's are UUID's that are a reference to another row in the same CF > that provides information about that game... > > location, final score, date (unix timestamp or ISO format) , and > statitics which are represented as a new column timestamp:player_uuid > > I can use PIG, as I understand, to run a query to generate specific > information about specific "things" and populate that data back into > Cassandra in another CF ... similar to the hypothetical search > above....as the information is structured already, i assume PIG is the > right tool for the job, but may not be ideal for a web application and > enabling ad-hoc queries ... it could take anywhere from 2-....? > seconds for that query to generate, populate, and return to the > user...? > > On the other hand, I have started to read about Solr / Solandra / > Lucandra .... can this provide similar functionality or better ? or > is it more geared towards full text search and indexing ... > > I don't want to get into the habit of guessing what my potential users > want to search for ... trying to think of ways to offload this to > them. > > > > -- > Sasha Dolgy > sasha.dolgy@gmail.com > -- http://twitter.com/tjake --20cf303f6adca4337604a63ce447 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Solandra can answer the question you used as an example and it's more o= f a fit for low-latency ad-hoc reporting then PIG. =A0Pig queries will take= minutes not seconds.

On Tue, Jun 2= 1, 2011 at 12:12 PM, Sasha Dolgy <sdolgy@gmail.com> wrote:
Folks,

Simple question ... Assuming my current use case is the ability to log
lots of trivial and seemingly useless sports statistics ... I want a
user to be able to query / compare .... For example:

--> Show me all baseball players in cheektowaga and ontario,
california who have hit a grandslam on tuesdays where it was just a
leap year.

Each baseball player is represented by a single row in a CF:

player_uuid, fullname, hometown, game1, game2, game3, game4

Game's are UUID's that are a reference to another row in the same C= F
that provides information about that game...

location, final score, date (unix timestamp or ISO format) , and
statitics which are represented as a new column timestamp:player_uuid

I can use PIG, as I understand, to run a query to generate specific
information about specific "things" and populate that data back i= nto
Cassandra in another CF ... similar to the hypothetical search
above....as the information is structured already, i assume PIG is the
right tool for the job, but may not be ideal for a web application and
enabling ad-hoc queries ... it could take anywhere from 2-....?
seconds for that query to generate, populate, and return to the
user...?

On the other hand, I have started to read about Solr / Solandra /
Lucandra .... can this provide similar functionality or better ? =A0or
is it more geared towards full text search and indexing ...

I don't want to get into the habit of guessing what my potential users<= br> want to search for ... trying to think of ways to offload this to
them.



--
Sasha Dolgy
sasha.dolgy@gmail.com



--
http://twitter.com/tjake
--20cf303f6adca4337604a63ce447--