cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10583) After bulk loading CQL query on timestamp column returns wrong result
Date Fri, 23 Oct 2015 19:36:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971593#comment-14971593
] 

Kai Wang edited comment on CASSANDRA-10583 at 10/23/15 7:35 PM:
----------------------------------------------------------------

This seems to be related to bulk loading. 

To reproduce:

1. Clone https://github.com/depend/issues/tree/master/CASSANDRA-10583. Build and run it, this
application will generate an sstable with 10 rows.
2. Load it into C* with sstableloader.
3. 
{noformat} 
cqlsh:timeseries_test> select * from double_daily;

 tag  | group | timestamp                | value
------+-------+--------------------------+-------
 TEST |     1 | 2002-05-01 04:00:00+0000 |     0
 TEST |     1 | 2002-05-02 04:00:00+0000 |     1
 TEST |     1 | 2002-05-03 04:00:00+0000 |     2
 TEST |     1 | 2002-05-04 04:00:00+0000 |     3
 TEST |     1 | 2002-05-05 04:00:00+0000 |     4
 TEST |     1 | 2002-05-06 04:00:00+0000 |     5
 TEST |     1 | 2002-05-07 04:00:00+0000 |     6
 TEST |     1 | 2002-05-08 04:00:00+0000 |     7
 TEST |     1 | 2002-05-09 04:00:00+0000 |     8
 TEST |     1 | 2002-05-10 04:00:00+0000 |     9

(10 rows)
{noformat} 

4. cqlsh:timeseries_test> select * from double_daily where tag='TEST' and group = 1 and
timestamp > '2002-05-01 00:00:00-0400';

 tag  | group | timestamp                | value
------+-------+--------------------------+-------
 TEST |     1 | 2002-05-01 04:00:00+0000 |     0
 TEST |     1 | 2002-05-02 04:00:00+0000 |     1
 TEST |     1 | 2002-05-03 04:00:00+0000 |     2
 TEST |     1 | 2002-05-04 04:00:00+0000 |     3
 TEST |     1 | 2002-05-05 04:00:00+0000 |     4
 TEST |     1 | 2002-05-06 04:00:00+0000 |     5
 TEST |     1 | 2002-05-07 04:00:00+0000 |     6
 TEST |     1 | 2002-05-08 04:00:00+0000 |     7
 TEST |     1 | 2002-05-09 04:00:00+0000 |     8
 TEST |     1 | 2002-05-10 04:00:00+0000 |     9

(10 rows)

5. cqlsh:timeseries_test> select * from double_daily where tag='TEST' and group = 1 and
timestamp > '2002-05-02 00:00:00-0400';

 tag | group | timestamp | value
-----+-------+-----------+-------

(0 rows)

I wasn't able to find that "equal" condition which returns everything. But query #5 still
shows nothing is later than 2002/5/2 which is not true.


was (Author: depend):
This seems to be related to bulk loading. 

To reproduce:

1. Clone https://github.com/depend/issues/tree/master/CASSANDRA-10583. Build and run it, this
application will generate an sstable with 10 rows.
2. Load it into C* with sstableloader.
3. cqlsh:timeseries_test> select * from double_daily;

 tag  | group | timestamp                | value
------+-------+--------------------------+-------
 TEST |     1 | 2002-05-01 04:00:00+0000 |     0
 TEST |     1 | 2002-05-02 04:00:00+0000 |     1
 TEST |     1 | 2002-05-03 04:00:00+0000 |     2
 TEST |     1 | 2002-05-04 04:00:00+0000 |     3
 TEST |     1 | 2002-05-05 04:00:00+0000 |     4
 TEST |     1 | 2002-05-06 04:00:00+0000 |     5
 TEST |     1 | 2002-05-07 04:00:00+0000 |     6
 TEST |     1 | 2002-05-08 04:00:00+0000 |     7
 TEST |     1 | 2002-05-09 04:00:00+0000 |     8
 TEST |     1 | 2002-05-10 04:00:00+0000 |     9

(10 rows)

4. cqlsh:timeseries_test> select * from double_daily where tag='TEST' and group = 1 and
timestamp > '2002-05-01 00:00:00-0400';

 tag  | group | timestamp                | value
------+-------+--------------------------+-------
 TEST |     1 | 2002-05-01 04:00:00+0000 |     0
 TEST |     1 | 2002-05-02 04:00:00+0000 |     1
 TEST |     1 | 2002-05-03 04:00:00+0000 |     2
 TEST |     1 | 2002-05-04 04:00:00+0000 |     3
 TEST |     1 | 2002-05-05 04:00:00+0000 |     4
 TEST |     1 | 2002-05-06 04:00:00+0000 |     5
 TEST |     1 | 2002-05-07 04:00:00+0000 |     6
 TEST |     1 | 2002-05-08 04:00:00+0000 |     7
 TEST |     1 | 2002-05-09 04:00:00+0000 |     8
 TEST |     1 | 2002-05-10 04:00:00+0000 |     9

(10 rows)

5. cqlsh:timeseries_test> select * from double_daily where tag='TEST' and group = 1 and
timestamp > '2002-05-02 00:00:00-0400';

 tag | group | timestamp | value
-----+-------+-----------+-------

(0 rows)

I wasn't able to find that "equal" condition which returns everything. But query #5 still
shows nothing is later than 2002/5/2 which is not true.

> After bulk loading CQL query on timestamp column returns wrong result
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-10583
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10583
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Datastax Community Edition 2.1.10, Windows 2008 R2, Java x64 1.8.0_60
>            Reporter: Kai Wang
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> I have this table:
> {noformat}
> CREATE TABLE test (
>     tag text,
>     group int,
>     timestamp timestamp,
>     value double,
>     PRIMARY KEY (tag, group, timestamp)
> ) WITH CLUSTERING ORDER BY (group ASC, timestamp DESC)
> {noformat}
> First I used CQLSSTableWriter to bulk load a bunch of sstables. Then I ran this query:
> {noformat}
> cqlsh> select * from test where tag = 'MSFT' and group = 1 and timestamp ='2004-12-15
16:00:00-0500';
>  tag  | group | timestamp                | value
> ------+-------+--------------------------+-------
>  MSFT |     1 | 2004-12-15 21:00:00+0000 | 27.11
>  MSFT |     1 | 2004-12-16 21:00:00+0000 | 27.16
>  MSFT |     1 | 2004-12-17 21:00:00+0000 | 26.96
>  MSFT |     1 | 2004-12-20 21:00:00+0000 | 26.95
>  MSFT |     1 | 2004-12-21 21:00:00+0000 | 27.07
>  MSFT |     1 | 2004-12-22 21:00:00+0000 | 26.98
>  MSFT |     1 | 2004-12-23 21:00:00+0000 | 27.01
>  MSFT |     1 | 2004-12-27 21:00:00+0000 | 26.85
>  MSFT |     1 | 2004-12-28 21:00:00+0000 | 26.95
>  MSFT |     1 | 2004-12-29 21:00:00+0000 |  26.9
>  MSFT |     1 | 2004-12-30 21:00:00+0000 | 26.76
> (11 rows)
> {noformat}
> The result is obviously wrong.
> If I run this query:
> {noformat}
> cqlsh> select * from test where tag = 'MSFT' and group = 1 and timestamp ='2004-12-16
16:00:00-0500';
>  tag | group | timestamp | value
> -----+-------+-----------+-------
> (0 rows)
> {noformat}
> In DevCenter I tried to create a similar table and insert a few rows but couldn't reproduce
this. This may have something to do with the bulk loading process. But still, the fact cqlsh
returns data that doesn't match the query is concerning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message