Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 80472 invoked from network); 13 Sep 2010 13:18:45 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Sep 2010 13:18:45 -0000 Received: (qmail 17099 invoked by uid 500); 13 Sep 2010 13:18:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16209 invoked by uid 500); 13 Sep 2010 13:18:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16201 invoked by uid 99); 13 Sep 2010 13:18:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Sep 2010 13:18:37 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wav100@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Sep 2010 13:18:33 +0000 Received: by ywk9 with SMTP id 9so2614893ywk.31 for ; Mon, 13 Sep 2010 06:18:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=OcjF1X4S6wpxLp6OFUUAC6vTcEw0EiW1pYIQlPlvlPg=; b=g9QWsSRLlnvrUgnB14ZHNy2gC8UQOZsVx+r5SeiJi0z05WHCsi1QJuWqA1OfVEo2bm X4kxgbxY4vFlAP+MTQ6XHDsOs/XlgMS9e18JiugQKct1/atWM23UIA5C4B5axf6bIeT8 ktqGEdMNJW1IwBOzskHQ37edWeFJd8AHsfK+I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=P65wGEI5ouoD+1BuefShBw3Ri0VKiQskdyu1rGCuDxxYGew2zZmpzpnBdTFvnus/PK fy9TAJbMKyd5jjUDao4z9HwLfa9Mo0DR3A3EUEc8elfUcMtJXcZ0zFzIPWnM/KezY3HV dhStVpgw5nAmrfZ/OGmj2S1xNwDpZwQkUwkUw= MIME-Version: 1.0 Received: by 10.101.165.35 with SMTP id s35mr4223478ano.258.1284383892379; Mon, 13 Sep 2010 06:18:12 -0700 (PDT) Received: by 10.231.139.90 with HTTP; Mon, 13 Sep 2010 06:18:12 -0700 (PDT) In-Reply-To: References: <4C8DEE39.6040001@softwareprojects.com> Date: Mon, 13 Sep 2010 09:18:12 -0400 Message-ID: Subject: Re: buggy secondary indexes? From: Wayne To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c9243aa9180f049023ed2a --001636c9243aa9180f049023ed2a Content-Type: text/plain; charset=ISO-8859-1 This is a use case we have been struggling with for a long time. How do we maintain indexes? It is easy to write out a secondary index even manually, but how does one maintain and index when a value changes. Our only scalable answer has been background processes that churn through everything to verify that the index is current. We have yet to test .7 secondary indexes, but even the sample code out there for async triggers does not show dealing with this. In db terms this would have to be a before update trigger to capture the current value and delete it. http://maxgrinev.com/2010/07/23/managing-indexes-in-cassandra-using-async-triggers/ This is the main problem with maintaining secondary indexes. We and I think many others are waiting for more detailed examples of how to best create secondary indexes that deals with these types of issues. The wiki is still TBD, and we are staying away from .7 until it is stabilized with better examples (what about super columns?). How has every one else dealt with this issue? On Mon, Sep 13, 2010 at 8:59 AM, Petr Odut wrote: > OK, I'll try it, but this: > > "recreate user with different email, then finding user by original email > returns again that updated user" > --> remove user "user" && insert new user: "user":{"email":" > other@email.com"} > > doesn't crash on thrift, it seems to be cassandra issue. > > Petr > > > On Mon, Sep 13, 2010 at 11:26 AM, Mike Peters < > cassandra@softwareprojects.com> wrote: > >> Sounds like you may need to patch your php thrift >> >> See >> http://www.softwareprojects.com/resources/programming/t-php-thrift-library-for-cassandra-1982.html >> >> >> On 9/13/2010 5:09 AM, Petr Odut wrote: >> >> Hi, >> let's have CF User with indexed column email. >> >> Now i insert new user: "user":{"email":"some@email.com"} >> >> finding user by email address by get_indexed_slices ... everything works >> all right >> updating email value (via batch_mutate) ends with TTransportException (TSocket: >> timed out reading 4 bytes from localhost:9160) >> remove user, then try to find user by email again - it returns an empty >> user "user":{} >> recreate user with different email, then finding user by original email is >> successful >> >> Hope that last 3 points are buggy behaviour, >> using cassandra 0.7beta1 + php thrift >> >> Petr >> >> >> > > > -- > Petr Odut [petr.odut@gmail.com] > --001636c9243aa9180f049023ed2a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable This is a use case we have been struggling with for a long time. How do we = maintain indexes? It is easy to write out a secondary index even manually, = but how does one maintain and index when a value changes. Our only scalable= answer has been background processes that churn through everything to veri= fy that the index is current. We have yet to test .7 secondary indexes, but= even the sample code out there for async triggers does not show dealing wi= th this. In db terms this would have to be a before update trigger to captu= re the current value and delete it.

http://maxgrinev.com/2010/07/23/managing-indexes-i= n-cassandra-using-async-triggers/

This is the main problem with = maintaining secondary indexes. We and I think many others are waiting for m= ore detailed examples of how to best create secondary indexes that deals wi= th these types of issues. The wiki is still TBD, and we are staying away fr= om .7 until it is stabilized with better examples (what about super columns= ?).

How has every one else dealt with this issue?


On Mon, Sep 13, 2010 at 8:59 AM, Petr Odut = <petr.odut@gmail.com> wrote:
OK, I'll try = it, but this:

"recreate us= er with different email, then finding user by original email returns again = that updated user"
--> remove user "user" &&= ;=A0insert new user: "user":{"email&q= uot;:"other@email= .com"}




On Mon, Sep 13, 2010 at 11:26 AM, Mike Peters <cassandra@softwareprojects.com> wrote:
=20 =20 =20
Sounds like you may need to patch your php thrift

See http://www.so= ftwareprojects.com/resources/programming/t-php-thrift-library-for-cassandra= -1982.html


On 9/13/2010 5:09 AM, Petr Odut wrote:
Hi,
let's have CF User with indexed column email.

Now i insert new user: "user":{"email":"= ;some@email.com&quo= t;}

finding user by email address by get_indexed_slices ... everything works all right
updating email value (via batch_mutate) ends with=A0TTransportExc= eption (TSocket: timed out reading 4 bytes from localhost:9160)
remove user, then try to find user by email again - it returns an empty user "user":{}
recreate user with different email, then finding user by original email is successful
=20 =20

Hope that last 3 points are buggy behaviour,
using cassandra 0.7beta1 + php thrift

Petr




--
Petr Odut [= petr.odut@gmail.co= m]

--001636c9243aa9180f049023ed2a--