cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11473) Clustering column value is zeroed out in some query results
Date Fri, 08 Apr 2016 22:24:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233067#comment-15233067
] 

Tyler Hobbs commented on CASSANDRA-11473:
-----------------------------------------

I've been looking at some of the partitions in an sstable that [~longtimer] was able to provide.
 The problem seems to be that there are extra, unexplained bytes at the end of rows.  I hacked
{{nodetool scrub}} to enable dumping the raw bytes for a partition.  Here's an example of
one of the partitions:

{noformat}
00 17 00 06 55 48 4E 54 52 43 00 00 04 00 00 00
00 00 00 04 00 00 00 00 00 7F FF FF FF 80 00 00
00 00 00 00 00 04 00 00 00 01 51 9C 84 68 B0 20
25 E8 57 F0 C8 01 00 F8 54 1E 52 C9 F8 00 00 00
00 00 F8 54 1E 52 C9 F8 00 00 00 00 04 00 00 00
01
{noformat}

By hand, I deserialized this into the following:

{noformat}
--- start partition header ---

00 17                   - partition key length (short)
00 06 55 48 4E 54 52 43 00
  00 04 00 00 00 00 00
  00 04 00 00 00 00 00  - partition key (composite ('UHNTRC', 0, 0))
7F FF FF FF             - local deletion time (max int)
80 00 00 00 00 00 00 00 - marked for delete at (min long)

-- end partition header ---

-- start row
04                      - flags (only HAS_TIMESTAMP flag present, note that HAS_ALL_COLUMNS
is not present)
00                      - clustering block header, unsigned vint (zero b/c no null/empty values)
00 00 01 51 9C 84 68 B0 - clustering (timestamp, 2015-12-13 12:05)
20                      - row body size (unsigned vint)
25                      - previous row size (unsigned vint)

E8 57 F0 C8             - timestamp (unsigned vint, E8 indicates three more bytes)
01                      - columns subset (unsigned vint, small encoding, indicates first column
is missing)

00                      - cell flags
F8 54 1E 52 C9 F8       - timestamp (unsigned vint, F8 indicates five more bytes)
00 00 00 00             - cell value (int32, value zero)

00                      - cell flags
F8 54 1E 52 C9 F8       - timestamp (unsigned vint, F8 indicates five more bytes)
00 00 00 00             - cell value (int32, value zero)

-- end row

04 00 00 00             - MYSTERY BYTES
01                      - end of partition marker (?)
{noformat}

A couple of notes:
* The four "MYSTERY BYTES" are what I cannot explain.  After looking over the serialization
code many times, I can't find a good explanation for these.
* There is actually a third column in the schema ("assignment", a timestamp column).  This
is why the "column subset" byte is 01 instead of 00.
* The two present columns are both ints, not an int an a float, like the schema in the description

I tried to reproduce this by creating a second table with the same schema.  This is as close
as I could get:

{noformat}
00 17 00 06 55 48 4E 54 52 43 00 00 04 00 00 00
00 00 00 04 00 00 00 00 00 7F FF FF FF 80 00 00
00 00 00 00 00 04 00 00 00 01 53 F7 D1 63 E1 18
25 E3 3F AD CB 01 00 E4 70 7B FE 00 00 00 00 00
E4 70 7B FE 00 00 00 00 01ยท

-- start partition header

00 17                   - partition key length
00 06 55 48 4E 54 52 43 00 00 04 00 00 00 00
00 00 04 00 00 00 00 00 - partition key
7F FF FF FF             - local deletion time
80 00 00 00 00 00 00 00 - marked for delete at

-- end partition header

-- start row

04                      - flags (only HAS_TIMESTAMP)
00                      - clustering block header
00 00 01 53 F7 D1 63 E1 - clustering (timestamp)
18                      - row body size
25                      - previous unfiltered size

E3 3F AD CB             - timestamp (unsigned vint, E3 indicates three more bytes)
01                      - columns subset

00                      - cell flags
E4 70 7B FE             - timestamp
00 00 00 00             - cell value (int32, value zero)

00                      - cell flags
E4 70 7B FE             - timestamp
00 00 00 00             - cell value (int32, value zero)

-- end row

01                      - end of partition marker
{noformat}

This is almost identical, except that it doesn't have the "mystery bytes".

It's also interesting to note that the next few partitions in Jason's sstable all have the
mystery bytes (I'm guessing all of them do):

{noformat}
Reading row at 0
row 0006524154434152000004000000000000040000000400 is 53 bytes
00 17 00 06 52 41 54 43 41 52 00 00 04 00 00 00 00
00 00 04 00 00 00 04 00 7F FF FF FF 80 00 00 00
00 00 00 00 04 00 00 00 01 4D 9A D8 2C 80 1D 25
00 01 00 F8 54 1E 3C B1 B8 00 00 00 00 00 F8 54
1E 3C B1 B8 00 00 00 00 04 00 00 00 01

Reading row at 78
row 000655484e545243000004000000000000040000000000 is 56 bytes
00 17 00 06 55 48 4E 54 52 43 00 00 04 00 00 00 00
00 00 04 00 00 00 00 00 7F FF FF FF 80 00 00 00
00 00 00 00 04 00 00 00 01 51 9C 84 68 B0 20 25
E8 57 F0 C8 01 00 F8 54 1E 52 C9 F8 00 00 00 00
00 F8 54 1E 52 C9 F8 00 00 00 00 04 00 00 00 01

Reading row at 159
row 00064a41534b414e000004000000000000040000001700 is 56 bytes
00 17 00 06 4A 41 53 4B 41 4E 00 00 04 00 00 00 00
00 00 04 00 00 00 17 00 7F FF FF FF 80 00 00 00
00 00 00 00 04 00 00 00 01 4D 8C C7 BD 90 20 25
E8 59 5C 10 01 00 F8 54 1E 54 B6 28 00 00 00 00
00 F8 54 1E 54 B6 28 00 00 00 00 04 00 00 00 01

Reading row at 240
row 00064d4d4d534850000004000000000000040000000c00 is 56 bytes
00 17 00 06 4D 4D 4D 53 48 50 00 00 04 00 00 00 00
00 00 04 00 00 00 0C 00 7F FF FF FF 80 00 00 00
00 00 00 00 04 00 00 00 01 52 71 7F 05 C0 20 25
E9 96 7F 90 01 00 F8 54 1F E8 D3 48 00 00 00 00
00 F8 54 1F E8 D3 48 00 00 00 00 04 00 00 00 01

Reading row at 321
row 0006524154434152000004000000000000040000000f00 is 56 bytes
00 17 00 06 52 41 54 43 41 52 00 00 04 00 00 00 00
00 00 04 00 00 00 0F 00 7F FF FF FF 80 00 00 00
00 00 00 00 04 00 00 00 01 4D 9A F5 F1 98 20 25
E9 97 B4 28 01 00 F8 54 1F F0 97 90 00 00 00 00
00 F8 54 1F F0 97 90 00 00 00 00 04 00 00 00 01
{noformat}

Another interesting observation: the "row body size" measurement includes the length of the
{{04 00 00 00}} bytes, as though they are expected.

So far I am stumped as to how those bytes got there.  They match the first four bytes in the
start of a row, but that could be a coincidence.

[~slebresne]  can you look over this and see if you have any ideas?  A fresh set of eyes may
help.

> Clustering column value is zeroed out in some query results
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-11473
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11473
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: debian jessie patch current with Cassandra 3.0.4
>            Reporter: Jason Kania
>            Assignee: Tyler Hobbs
>
> As per a discussion on the mailing list, http://www.mail-archive.com/user@cassandra.apache.org/msg46902.html,
we are encountering inconsistent query results when the following query is run:
> {noformat}
> select "subscriberId","sensorUnitId","sensorId","time" from 
> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND "sensorId"=0
ORDER BY "time" LIMIT 10;
> {noformat}
> Invalid Query Results
> {noformat}
> subscriberId    sensorUnitId    sensorId    time
> JASKAN    0    0    2015-05-24 2:09
> JASKAN    0    0    1969-12-31 19:00
> JASKAN    0    0    2016-01-21 2:10
> JASKAN    0    0    2016-01-21 2:10
> JASKAN    0    0    2016-01-21 2:10
> JASKAN    0    0    2016-01-21 2:11
> JASKAN    0    0    2016-01-21 2:22
> JASKAN    0    0    2016-01-21 2:22
> JASKAN    0    0    2016-01-21 2:22
> JASKAN    0    0    2016-01-21 2:22
> {noformat}
> Valid Query Results
> {noformat}
> subscriberId    sensorUnitId    sensorId    time
> JASKAN    0    0    2015-05-24 2:09
> JASKAN    0    0    2015-05-24 2:09
> JASKAN    0    0    2015-05-24 2:10
> JASKAN    0    0    2015-05-24 2:10
> JASKAN    0    0    2015-05-24 2:10
> JASKAN    0    0    2015-05-24 2:10
> JASKAN    0    0    2015-05-24 2:11
> JASKAN    0    0    2015-05-24 2:13
> JASKAN    0    0    2015-05-24 2:13
> JASKAN    0    0    2015-05-24 2:14
> {noformat}
> Running the following yields no rows indicating that the 1969... timestamp is invalid.
> {noformat}
> select "subscriberId","sensorUnitId","sensorId","time" FROM "edgeTransitionIndex" where
"subscriberId"='JASKAN' AND "sensorUnitId"=0 AND "sensorId"=0 and time='1969-12-31 19:00:00-0500';
> {noformat}
> The schema is as follows:
> {noformat}
> CREATE TABLE sensorReading."sensorReadingIndex" (
>     "subscriberId" text,
>     "sensorUnitId" int,
>     "sensorId" int,
>     time timestamp,
>     "classId" int,
>     correlation float,
>     PRIMARY KEY (("subscriberId", "sensorUnitId", "sensorId"), time)
> ) WITH CLUSTERING ORDER BY (time ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> CREATE INDEX classSecondaryIndex ON sensorReading."sensorReadingIndex" ("classId");
> {noformat}
> We were asked to provide our sstables as well but these are very large and would require
some data obfuscation. We are able to run code or scripts against the data on our servrers
if that is option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message