incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate Sammons <NSamm...@ften.com>
Subject RE: Secondary index issue, unable to query for records that should be there
Date Tue, 08 Nov 2011 17:19:54 GMT
I restarted with logging turned up to DEBUG, and after quite a bit of logging during startup,
I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) restricted ranges
for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) scan ranges
are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) Blockfor/repair
is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c
from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) Primary scan clause
is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) Expanding slice
filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute
EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=}
to 808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 ResponseVerbHandler.java (line 44)
Processing response on a callback from 808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) Blockfor/repair
is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d
from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) Primary scan clause
is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) Expanding slice
filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute
EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=}
to 809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 ResponseVerbHandler.java (line 44)
Processing response on a callback from 809@natebookpro/127.0.1.1



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" produced a result,
and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) Command/ConsistencyLevel
is SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest',
superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) Blockfor/repair
is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) reading data
locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) LocalReadRunnable
reading SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest',
superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) collecting 0
of 1000000: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 1
of 1000000: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 2
of 1000000: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 3
of 1000000: minute:false:4@1320769510502016
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 4
of 1000000: month:false:4@1320769510502013
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 5
of 1000000: year:false:4@1320769510502012
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689) Read: 10 ms.



Note that a query for "get IndexTest where minute=49" (which also returns no records) results
in the following logs:


DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1017) restricted ranges
for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1104) scan ranges
are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77) Blockfor/repair
is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11
from natebookpro/127.0.1.1
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96) Primary scan clause
is minute
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109) Expanding slice
filter to entire row to cover additional expressions
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute
EQ 49' starting with
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=}
to 462@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215 ResponseVerbHandler.java (line 44)
Processing response on a callback from 462@natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77) Blockfor/repair
is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898
from natebookpro/127.0.1.1
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96) Primary scan clause
is minute
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109) Expanding slice
filter to entire row to cover additional expressions
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute
EQ 49' starting with
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=}
to 463@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219 ResponseVerbHandler.java (line 44)
Processing response on a callback from 463@natebookpro/127.0.1.1





Thanks,

-nate






From: Jake Luciani [mailto:jakers@gmail.com]
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more insite into what's
going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NSammons@ften.com<mailto:NSammons@ften.com>>
wrote:
This is against a single server, not a cluster.  Replication factor for the keyspace is set
to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes
like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com<mailto:rkalla@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do
have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you
writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NSammons@ften.com<mailto:NSammons@ften.com>>
wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several
secondary indexes to try out some options.  Right now I have the following to create my CF
using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a
Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the
command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and
messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no
longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should
be there and get data back such as the above query, but later that same query returns 0 rows.
 Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the
write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries
with multiple indexed columns is good (does Cassandra intelligently use all available indexes
on these queries?)



Thanks,

-nate




--
http://twitter.com/tjake

Mime
View raw message