Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 62294 invoked from network); 19 Sep 2010 00:46:17 -0000 Received: from unknown (HELO mail.apache.org) (::) by ::ffff:140.211.11.9 with SMTP; 19 Sep 2010 00:46:17 -0000 Received: (qmail 24924 invoked by uid 500); 19 Sep 2010 00:46:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24866 invoked by uid 500); 19 Sep 2010 00:46:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24858 invoked by uid 99); 19 Sep 2010 00:46:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Sep 2010 00:46:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a43.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Sep 2010 00:46:07 +0000 Received: from homiemail-a43.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTP id 5175A8C065 for ; Sat, 18 Sep 2010 17:45:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=message-id :from:to:in-reply-to:content-type:content-transfer-encoding :mime-version:subject:date:references; q=dns; s= thelastpickle.com; b=vveHMiwdoZyr79erpbem124qOw6165Chb+UCte42+Qk Afi01GzlC6rRwpBgEeFqsk79GPLbrf0PUFI4UOQjabXLt0cakg9iub3SjTtv2mvz d57WhFdmPZjVUJ5sItyEvzmYv0YEnsRVaAJe+DeP2iUwBdQmE+UIOW/E3TW6iLSM = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references; s=thelastpickle.com; bh=/VGJ7DlJlJGHFSTkuT/eC9z7WQw=; b=AcguxEa wfZubPfGIGbDMSDmNvCiVtR1nW6ZV8Wc4hkzJu3hiEwUQmf+j7zT4HBN+KdWv2H6 0MBydX+jJNmfsXTH2z1SaZkwY31TkRol6JC2sIeldHfjnr8qCAAwsGhsVxfpTLb2 /jC18VkGKZ6yr2E7XwLm/6yvr5utMs8OOSAk= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTPSA id 966F68C05F for ; Sat, 18 Sep 2010 17:45:43 -0700 (PDT) Message-Id: <3FD90156-6791-4C5E-BA11-4E415F5FF876@thelastpickle.com> From: Aaron Morton To: "user@cassandra.apache.org" In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-1--540212918 Content-Transfer-Encoding: 7bit X-Mailer: iPad Mail (7B500) Mime-Version: 1.0 (iPad Mail 7B500) Subject: Re: Build an index to for join query Date: Sun, 19 Sep 2010 12:45:48 +1200 References: X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-1--540212918 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable In the cassandra world the best approach is to create on CF with the = name and address in it. =20 Use a super CF with one super col for the user data and one super col = for every address they have. Pull the entire row back every time you = want to read the data. No need for joins. Aaron On 18 Sep 2010, at 08:56, Alvin UW wrote: > Thanks Paul, >=20 > If we make a CF Name_Address(name, address) rather than an index, we = have to maintain it, once any change happens in ID_Address(Id, address) = , Name_ID(name, id). Besides, it also occupies some space. >=20 > In contrast, if Name_Address(name, address) is just an index, we can = redirect the query to ID_Address(Id, address) , Name_ID(name, id) = without the cost of maintenance. > Does it make sense? >=20 > Alvin > =20 >=20 > 2010/9/16 Rock, Paul > Alvin - assuming I understand what you're after correctly, why not = make a CF Name_Address(name, address). Modifying the Cassandra methods = to do the "join" you describe seems like overkill to me... >=20 > -Paul >=20 > On Sep 15, 2010, at 7:34 PM, Alvin UW wrote: >=20 >> Hello, >>=20 >> I am going to build an index to join two CFs. >> First, we see this index as a CF/SCF. The difference is I don't = materialise it. >> Assume we have two tables: >> ID_Address(Id, address) , Name_ID(name, id) >> Then,the index is: Name_Address(name, address) >>=20 >> When the application tries to query on Name_Address, the value of = "name" is given by the application. >> I want to direct the read operation to Name_ID to get "Id" value, = then go to ID_Address to=20 >> get the "address" value by the "Id" value. So far, I consider only = the read operation. >> By this way, the join query is transparent to the user.=20 >>=20 >> So I think I should find out which methods or classes are in charge = of the read operation in the above operation. >> For example, the operation in cassandra CLI "get = Keyspace1.Standard2['jsmith']" calls exactly which methods >> in the server side? >>=20 >> I noted CassandraServer is used to listen to clients, and there are = some methods such as get(), get_slice(). >> Is it the right place I can modify to implement my idea? =20 >>=20 >> Thanks. >>=20 >> Alvin >=20 >=20 --Apple-Mail-1--540212918 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
In the cassandra world the best approach is to create on CF with the name and address in it.  

Use a super CF with one super col for the user data and one super col for every address they have. Pull the entire row back every time you want to read the data. No need for joins.

Aaron


On 18 Sep 2010, at 08:56, Alvin UW <alvinuw@gmail.com> wrote:

Thanks Paul,

If we make a CF Name_Address(name, address) rather than an index, we have to maintain it, once any change happens in ID_Address(Id, address) ,  Name_ID(name, id). Besides, it also occupies some space.

In contrast, if Name_Address(name, address) is just an index, we can redirect the query to ID_Address(Id, address) ,  Name_ID(name, id) without the cost of maintenance.
Does it make sense?

Alvin
 

2010/9/16 Rock, Paul <paul.rock@teamaol.com>
Alvin - assuming I understand what you're after correctly, why not make a CF Name_Address(name, address). Modifying the Cassandra methods to do the "join" you describe seems like overkill to me...

-Paul

On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:

Hello,

I am going to build an index to join two CFs.
First, we see this index as a CF/SCF. The difference is I don't materialise it.
Assume we have two tables:
ID_Address(Id, address) ,  Name_ID(name, id)
Then,the index is: Name_Address(name, address)

When the application tries to query on Name_Address, the value of "name" is given by the application.
I want to direct the read operation  to Name_ID to get "Id" value, then go to ID_Address to
get the "address" value by the "Id" value. So far, I consider only the read operation.
By this way, the join query is transparent to the user.

So I think I should find out which methods or classes are in charge of the read operation in the above operation.
For example, the operation in cassandra CLI "get Keyspace1.Standard2['jsmith']" calls exactly which methods
in the server side?

I noted CassandraServer is used to listen to clients, and there are some methods such as get(), get_slice().
Is it the right place I can modify to implement my idea? 

Thanks.

Alvin


--Apple-Mail-1--540212918--