cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Webster <webste...@gmail.com>
Subject Re: question when using SASI indexing
Date Fri, 05 Aug 2016 11:18:13 GMT
Thanks DuyHai,

I would agree but we have not performed any delete operations in over a
month. To me this looks like a potential bug or misconfiguration (on my
end) with SASI.

I say this for a few reasons:
1) we have not performed a delete operation since the indexes were created
2) when I perform a query, against the same table, for the sha256 of an ELF
file I do receive a result.
SELECT * FROM testing.objects WHERE sha256 =
'1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f';

 sha256                                                           | mime
------------------------------------------------------------------+---------------------------------------------------------------------
 1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF
32-bit MSB  executable, PowerPC or cisco 4500, version 1 (SYSV)

3) If I dont use the SASI index and instead loop through the entries
manually, I get 187 results.
4) When I attempted the same SASI query again today, I again receive
inconsistent results that were between 0-7. After a few attempts it again
began to return 0.

Do you see any errors in my index command?

CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed'
: 'true', 'analyzer_class' :
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'tokenization_enable_stemming' : 'false', 'tokenization_locale' :
'en', 'tokenization_normalize_lowercase' : 'true',
'tokenization_skip_stop_words' : 'true'};


Some of our SASI indexes are fairly large as we were testing the ability to
use SASI over elastic search or basic processing through spark. I will run
some more tests today and see if I can uncover anything.


On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Ok the fact that you see some rows and after a while you see 0 rows means
> that those rows are deleted.
>
> Since SASI does only index INSERT & UPDATE but not DELETE, management of
> tombstones is let to Cassandra to handle.
>
> It means that if you do an INSERT, you'll have an entry into SASI index
> file but when you do a DELETE, SASI does not remove the entry from its
> index file.
>
> When reading, SASI will give the partition offset to Cassandra and
> Cassandra will fetch the data from SSTables, then realises that there is a
> tombstone, thus return 0 row.
>
> The only moment those entries will be remove from SASI index file is when
> your SSTable get compacted and the data are purged.
>
> The fact that you can see some rows then 0 rows mean that some of your
> replicas have missed the tombstones.
>
> "However, after about 20 attempts, all servers started to only return 0
> results. " --> Read-repair kicks in so the tombstones are propagated and
> then you see 0 row.
>
>
>
> On Tue, Aug 2, 2016 at 10:52 PM, George Webster <webstergd@gmail.com>
> wrote:
>
>> The indexes were written about 1-2 months ago. No data has been added to
>> the servers since the indexes were created. Additionally, the indexes
>> appeared to be stable until I noticed the issue today. ... which occurred
>> after a made a large query without setting a LIMIT
>>
>> I set the consistency level and moved the select statement between
>> different nodes. The results remained inconsistent, returning a random
>> number between 0 and 8. It did not appear to make much difference between
>> the different nodes or consistency level. However, after about 20 attempts,
>> all servers started to only return 0 results.
>>
>>
>> Lastly, this appeared in the logs during that time:
>>
>> INFO  [IndexSummaryManager:1] 2016-08-02 22:11:43,245
>> IndexSummaryRedistribution.java:74 - Redistributing index summaries
>>
>> INFO  [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>> 1048576 bytes
>>
>> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>
>>> One possible explanation is that you're querying data while the index
>>> files are being built so that the result are different
>>>  The second possible explanation is the consistency level.
>>>
>>> Try the query again using CL = QUORUM, try on several nodes to see if
>>> the results are different
>>>
>>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webstergd@gmail.com>
>>> wrote:
>>>
>>>> Hey DuyHai,
>>>> Thank you for your help.
>>>>
>>>> 1) Cassandra version
>>>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
>>>>
>>>>
>>>> 2) CREATE CUSTOM INDEX statement for your index
>>>>
>>>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'analyzed' : 'true', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 'tokenization_normalize_lowercase'
: 'true', 'tokenization_skip_stop_words' : 'true'};
>>>>
>>>>
>>>> 3) Consistency level used for your SELECT
>>>> I am using the default consistency
>>>> cassandra@cqlsh> CONSISTENCY
>>>> Current consistency level is ONE.
>>>>
>>>>
>>>> 4) Replication factor
>>>>
>>>> CREATE KEYSPACE system_distributed WITH REPLICATION = {
>>>> 	'class' : 'org.apache.cassandra.locator.SimpleStrategy',
>>>> 	'replication_factor': '3' }
>>>> AND DURABLE_WRITES = true;
>>>>
>>>>
>>>> 5) Are you creating the index when the table is EMPTY or have you
>>>> created the index when the table already contains some data ?
>>>> I created the indexes after the tables contained data.
>>>>
>>>>
>>>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduyhai@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello George
>>>>>
>>>>> Can you provide more details ?
>>>>>
>>>>> 1) Cassandra version
>>>>> 2) CREATE CUSTOM INDEX statement for your index
>>>>> 3) Consistency level used for your SELECT
>>>>> 4) Replication factor
>>>>> 5) Are you creating the index when the table is EMPTY or have you
>>>>> created the index when the table already contains some data ?
>>>>>
>>>>> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webstergd@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey guys and gals,
>>>>>>
>>>>>> I am having a strange issue with Cassandra SASI and I was hoping
you
>>>>>> could help solve the mystery. My issue is inconsistency between returned
>>>>>> results and strange log errors.
>>>>>>
>>>>>> The biggest issue is that when I perform a query I am getting back
>>>>>> inconsistent results. First few times I received between 3 and 7
results
>>>>>> and then I finally received 187 results. At no point in time did
I change
>>>>>> the query statement. However, after I received the 187 results, any
on
>>>>>> queries returned zero results.
>>>>>>
>>>>>> my query:
>>>>>> SELECT *
>>>>>>     FROM test.objects
>>>>>>     WHERE mime LIKE 'ELF%';
>>>>>>
>>>>>> When I look in the system.log file I see the following:
>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>>>> SelectStatement.java:351 - Aggregation query used without partition
key
>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>>>> SelectStatement.java:351 - Aggregation query used without partition
key
>>>>>>
>>>>>>
>>>>>> When I look in the debug.log file I see the following when zero
>>>>>> results are returned:
>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>>>> SelectStatement.java:351 - Aggregation query used without partition
key
>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>>>> SelectStatement.java:351 - Aggregation query used without partition
key
>>>>>>
>>>>>> Additionally, I see a lot of errors in the log that state:
>>>>>> INFO  [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91
>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate
chunk of
>>>>>> 1048576 bytes
>>>>>> INFO  [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91
>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate
chunk of
>>>>>> 1048576 bytes
>>>>>>
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message