Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48B1575E8 for ; Mon, 7 Nov 2011 21:43:42 +0000 (UTC) Received: (qmail 52792 invoked by uid 500); 7 Nov 2011 21:43:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52766 invoked by uid 500); 7 Nov 2011 21:43:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52758 invoked by uid 99); 7 Nov 2011 21:43:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2011 21:43:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of NSammons@ften.com designates 207.5.74.28 as permitted sender) Received: from [207.5.74.28] (HELO exhub003-1.exch003intermedia.net) (207.5.74.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2011 21:43:30 +0000 Received: from EXVDMBX003-1.exch003intermedia.net ([207.5.72.16]) by exhub003-1.exch003intermedia.net ([207.5.74.28]) with mapi; Mon, 7 Nov 2011 13:43:08 -0800 From: Nate Sammons To: "user@cassandra.apache.org" Date: Mon, 7 Nov 2011 13:43:08 -0800 Subject: Secondary index issue, unable to query for records that should be there Thread-Topic: Secondary index issue, unable to query for records that should be there Thread-Index: Acydk9M8Wa6YEtIRRXG8d3+cWoIklA== Message-ID: <95AD5EB0BCCF284CB0194E8300A23E4A4DE59C67D1@EXVDMBX003-1.exch003intermedia.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_95AD5EB0BCCF284CB0194E8300A23E4A4DE59C67D1EXVDMBX0031ex_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_95AD5EB0BCCF284CB0194E8300A23E4A4DE59C67D1EXVDMBX0031ex_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello, I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got = a CF with several secondary indexes to try out some options. Right now I h= ave the following to create my CF using the CLI: create column family MyTest with key_validation_class =3D UTF8Type and comparator =3D UTF8Type and column_metadata =3D [ -- absolute timestamp for this message, also indexed year/month/day/h= our/minute -- index these as they are low cardinality {column_name:messageTimestamp, validation_class:LongType}, {column_name:messageYear, validation_class:IntegerType, index_type: K= EYS}, {column_name:messageMonth, validation_class:IntegerType, index_type: = KEYS}, {column_name:messageDay, validation_class:IntegerType, index_type: KE= YS}, {column_name:messageHour, validation_class:IntegerType, index_type: K= EYS}, {column_name:messageMinute, validation_class:IntegerType, index_type:= KEYS}, ... other non-indexed columns defined ]; So when I insert data, I calculate a year/month/day/hour/minute and set the= se values on a Hector ColumnFamilyUpdater instance and update that way. Th= en later I can query from the command line with CQL such as: get MyTest where messageYear=3D2011 and messageMonth=3D6 an= d messageDay=3D1 and messageHour=3D13 and messageMinute=3D44; etc. This generally works, however at some point queries that I know shoul= d return data no longer return any rows. So for instance, part way through my test (inserting 250K rows), I can quer= y for what should be there and get data back such as the above query, but l= ater that same query returns 0 rows. Similarly, with fewer clauses in the = expression, like this: get MyTest where messageYear=3D2011 and messageMonth=3D6; Will also return 0 rows. ??????? Any idea what could be going wrong? I'm not getting any exceptions in my c= lient during the write, and I don't see anything in the logs (no errors any= way). A second question - is what I'm doing insane? I'm not sure that performanc= e on CQL queries with multiple indexed columns is good (does Cassandra inte= lligently use all available indexes on these queries?) Thanks, -nate --_000_95AD5EB0BCCF284CB0194E8300A23E4A4DE59C67D1EXVDMBX0031ex_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hello,

 

I’= ;m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve= got a CF with several secondary indexes to try out some options.  Rig= ht now I have the following to create my CF using the CLI:

 

create column = family MyTest with

  key_validation= _class =3D UTF8Type

  and comparato= r =3D UTF8Type

  and column_metadat= a =3D [

      -= - absolute timestamp for this message, also indexed year/month/day/hour/min= ute

      -- in= dex these as they are low cardinality

&n= bsp;     {column_name:messageTimestamp, validation_clas= s:LongType},

    &nb= sp; {column_name:messageYear, validation_class:IntegerType, index_type: KEY= S},

      {colu= mn_name:messageMonth, validation_class:IntegerType, index_type: KEYS},=

      {column_name:= messageDay, validation_class:IntegerType, index_type: KEYS},

=

      {column_name:messageHou= r, validation_class:IntegerType, index_type: KEYS},

      {column_name:messageMinute, val= idation_class:IntegerType, index_type: KEYS},

 

    &nb= sp;           … oth= er non-indexed columns defined

&nbs= p;

  ];

 

 

So when I insert data, I calculate a year/month/day/hour/minut= e and set these values on a Hector ColumnFamilyUpdater instance and update = that way.  Then later I can query from the command line with CQL such = as:

 

           &nb= sp;    get MyTest where messageYear=3D2011 and messageMonth= =3D6 and messageDay=3D1 and messageHour=3D13 and messageMinute=3D44;

 

etc.=   This generally works, however at some point queries that I know shou= ld return data no longer return any rows.

 

So for instance, part way throu= gh my test (inserting 250K rows), I can query for what should be there and = get data back such as the above query, but later that same query returns 0 = rows.  Similarly, with fewer clauses in the expression, like this:

 

&= nbsp;           &nbs= p;   get MyTest where messageYear=3D2011 and messageMonth=3D6;

 

= Will also return 0 rows.

 

 

??????= ?

Any idea what could be going wrong?&nb= sp; I’m not getting any exceptions in my client during the write, and= I don’t see anything in the logs (no errors anyway).

<= p class=3DMsoNormal> 

 

 

A se= cond question – is what I’m doing insane?  I’m not s= ure that performance on CQL queries with multiple indexed columns is good (= does Cassandra intelligently use all available indexes on these queries?)

 

 

 

Thanks,

 

<= p class=3DMsoNormal>-nate

= --_000_95AD5EB0BCCF284CB0194E8300A23E4A4DE59C67D1EXVDMBX0031ex_--