incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate Sammons <NSamm...@ften.com>
Subject RE: Secondary index issue, unable to query for records that should be there
Date Tue, 08 Nov 2011 15:45:39 GMT
This is against a single server, not a cluster.  Replication factor for the keyspace is set
to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes
like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do
have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you
writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NSammons@ften.com<mailto:NSammons@ften.com>>
wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several
secondary indexes to try out some options.  Right now I have the following to create my CF
using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a
Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the
command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and
messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no
longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should
be there and get data back such as the above query, but later that same query returns 0 rows.
 Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the
write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries
with multiple indexed columns is good (does Cassandra intelligently use all available indexes
on these queries?)



Thanks,

-nate


Mime
View raw message