Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 66947 invoked from network); 3 May 2010 14:35:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 14:35:37 -0000 Received: (qmail 82714 invoked by uid 500); 3 May 2010 14:35:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82698 invoked by uid 500); 3 May 2010 14:35:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82690 invoked by uid 99); 3 May 2010 14:35:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 14:35:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jshook@gmail.com designates 209.85.221.192 as permitted sender) Received: from [209.85.221.192] (HELO mail-qy0-f192.google.com) (209.85.221.192) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 14:35:31 +0000 Received: by qyk30 with SMTP id 30so3860381qyk.16 for ; Mon, 03 May 2010 07:35:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=7AQxuvpyt/CnfRPvkvFkVaf2yDUA4Uh28xLO7WSnKdY=; b=s432P4nbzy39kciIkdueAHGS6o3BCJvDlRyy3EBYN1X91fcdBg/FcY6XfSSObMOVjO wJoVUozy5Jm6QZf8ry6Mnn8iuR5Q1GoqHRI7ocvWZfJ9vqLLA5sTWIWgzb4iSBSzUZP0 NuIlfeyrQxQkQKvqyZRaWFuihjjgRpvdSp9rM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=IsLY1Vf6Mg+POhpU1SYzHPWE1cDjn9X/AT234KRRFRYd5noKK9g0TeYwShaG1TdQNF NNhwfnGxvT/L/4KfMBt2kblnA+qnLWUZX8Y9HyhyAYIm6GrU9PCwfX+BfF7IKJo0CTsj K9Wv6Ni9Do0tkMvbDH8mqutONGQgN4It1O9G4= MIME-Version: 1.0 Received: by 10.224.44.158 with SMTP id a30mr6330840qaf.138.1272897309617; Mon, 03 May 2010 07:35:09 -0700 (PDT) Received: by 10.229.95.132 with HTTP; Mon, 3 May 2010 07:35:09 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 May 2010 09:35:09 -0500 Message-ID: Subject: Re: Search Sample and Relation question because UDDI as Key From: Jonathan Shook To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00c09f9db04ff9b0590485b17f36 --00c09f9db04ff9b0590485b17f36 Content-Type: text/plain; charset=ISO-8859-1 I am only speaking to your second question. It may be helpful to think of modeling your storage layout in terms of * lists * sets * hash maps ... and certain combinations of these. Since there are no schema-defined relations, your relations may appear implicit between different views or "copies" of your data. The relationship can be assumed to be explicit to the extent that it is used in that way or even (in some cases) enforced by a boundary layer in your software. For accessing data by value, you can try to do your bookkeeping (indexing) as you go, by maintaining auxiliary maps directly via your application. Scanning by value is really not a strong point for Cassandra, and in fact is one of the trade-offs made when moving to a DHT ( http://en.wikipedia.org/wiki/Distributed_hash_table) data store. There has been discussion around putting some form of value indexing in at some point in the future, but the plans appear indefinite. Even with this, it would move workload into the hub which may otherwise be better handled in a client node. On Sun, May 2, 2010 at 4:33 PM, CleverCross | Falk Wolsky < falk.wolsky@clevercross.eu> wrote: > Hello, > > 1) Can you provide a solution or a sample for searching (Column and > SuperColumn) (Fulltext). > What is the Way to realize this? Hadoop/MapReduce? See you a posibility to > build/use a index for columns? > > Why this: In a given Data-Model we "must" use UUIDs as Key and have > actually no chance to seach values from "Columns"? (or not?) > > 2) How can we realize a "relation" > > For Sample: (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model > ) > Arin describes good a simple Data-Model to build a Blog. But how can we > read (filter) all Posts from "BlogEntries" from a single Autor? > (filter the Supercolumns by a culum inside of a SuperColumn) > > The "relation" for Sample is Autor -> BlogEntries... > To filter the Datas there is a needing to specify in a "get(...)"-Function > a Column/Value combination... > > I know well that cassandra is not a "relational Database"! But without > these releations the usage is very "limited" (specialized) > > Thanks in Advance! - and thx for Cassandra! > With Hector i build a (Apache)Cocoon-Transformer... > > With Kind Regards, > Falk Wolsky > --00c09f9db04ff9b0590485b17f36 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am only speaking to your second question.

It may be helpful to thi= nk of modeling your storage layout in terms of
* lists
* sets
* ha= sh maps
... and certain combinations of these.

Since there are no= schema-defined relations, your relations may appear implicit between diffe= rent views or "copies" of your data. The relationship can be assu= med to be explicit to the extent that it is used in that way or even (in so= me cases) enforced by a boundary layer in your software.

For accessing data by value, you can try to do your bookkeeping (indexi= ng) as you go, by maintaining auxiliary maps directly via your application.= Scanning by value is really not a strong point for Cassandra, and in fact = is one of the trade-offs made when moving to a DHT (http://en.wikipedia.org/wiki/Distr= ibuted_hash_table) data store.

There has been discussion around putting some form of value indexing in= at some point in the future, but the plans appear indefinite. Even with th= is, it would move workload into the hub which may otherwise be better handl= ed in a client node.


On Sun, May 2, 2010 at 4:33 PM, CleverCr= oss | Falk Wolsky <falk.wolsky@clevercross.eu> wrote:
Hello,

1) Can you provide a solution or a sample for searching (Column and SuperCo= lumn) (Fulltext).
What is the Way to realize this? Hadoop/MapReduce? See you a posibility to = build/use a index for columns?

Why this: In a given Data-Model we "must" use UUIDs as Key and ha= ve actually no chance to seach values from "Columns"? (or not?)
2) How can we realize a "relation"

For Sample: (http://arin.me/blog/wtf-is-a-supercolumn-cass= andra-data-model)
Arin describes good a simple Data-Model to build a Blog. But how can we rea= d (filter) all Posts from "BlogEntries" from a single Autor?
(filter the Supercolumns by a culum inside of a SuperColumn)

The "relation" for Sample is Autor -> BlogEntries...
To filter the Datas there is a needing to specify in a "get(...)"= -Function a Column/Value combination...

I know well that cassandra is not a "relational Database"! But wi= thout these releations the usage is very "limited" (specialized)<= br>
Thanks in Advance! - and thx for Cassandra!
With Hector i build a (Apache)Cocoon-Transformer...

With Kind Regards,
Falk Wolsky

--00c09f9db04ff9b0590485b17f36--