incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Sanderson <asander...@lingotek.com>
Subject Re: Problems with subcolumn retrieval after upgrade from 0.6 to 0.7
Date Fri, 22 Apr 2011 17:14:11 GMT
  I did some more sleuth work and found out what's going on.  The 0.6 data
was serialized using the wrong compare method, and as a result, when
importing data into 0.7, the presorted wrapper for the subcolumns would
misbehave on some operations(like remove()).  The supercolumn operation gets
all the columns then filters down to the requested column list/range.  The
remove() fails because the underlying map for column names/IColumn does not
start at the same place the comparator expects it to.  The operation looks
at the first element, and if the comparator returns -1 for compareTo(key,
firstKey) it assumes the key is not found in the range.

  The reason this happened appears to be that there was a typo in my 0.6
keyspace definition...instead of CompareSubcolumnsWith(small c) I used the
key CompareSubColumnsWith(big C). Doh!  I'm assuming that there is no
validation check on the keys in the storage conf file, and since subcolumn
comparator is optional, the file loaded just fine.  As a result my subcolumn
comparator was ignored and the BytesType was used instead.  So the big
question is how the heck do I fix it?  This would be similar to a case where
I change the type of a column, say switch from BytesType to AsciiType in
order to sort it in a different fashion.

On Tue, Apr 19, 2011 at 4:38 PM, Abraham Sanderson
<asanderson@lingotek.com>wrote:

> Aaron,
>
>   I'll try my best...I'm still trying to make heads or tails of the output
> as well.  The first line is debugging output from me; just printing the
> values for key, supercolumn name, and the wrapper class I've built for the
> subcolumn.  This was prior to 0.7.1, so the key is a String e.g.
> "80324d09-302b-4093-9708-e509091e5d8"; the supercolumn is a custom
> serializable object type, the first byte "0F" is for the type, the rest of
> the sequence "AC ED ... 44 45" is byte array which is backing an
> ObjectOutputStream; the subcolumn is my own construct, the name in this case
> is the custom uuid type, represented by the type byte "10" and then the
> bytes of the UUID "78 CF D5 25 A5 20 45 8E 85 84 25 94 15 B8 84 05".  I
> then did a print of the column parent and predicate being used for the
> get_slice command, just to be sure that everything matches.  The methods for
> that are part of cassandra code.  Then I do a print of the
> ColumnOrSuperColumn returned by the slice command.  It looks to me like not
> all the bytes are shown in some of the cases...I looked up the thrift source
> code used by cassandra and it looks like what is displayed is the ByteBuffer
> from position=0 up to the limit, and truncates past the first 128 bytes.
> Hard to tell what is going on with those because of that, but it does look
> like the buffer for the name is actually stopping at the right place...the
> first column in the top example ends in "10 49 5D 01 32 73 0D 48 03 85 09 CA
> F1 AF 6F 60 63" (uuid 495d0132-730d-4803-8509-caf1af6f6063), the next ends
> in "10 78 CF D5 25 A5 20 45 8E 85 84 25 94 15 B8 84 05"(uuid
> 78cfd525-a520-458e-8584-259415b88405).
>
> As you asked, I put in some more debugging to illustrate the bytes returned
> in the column name.  Below is one of the columns that fails:
>
>
> get_slice for key: 80324d09-302b-4093-9708-e509091e5d8f supercolumn:
> 0faced00057372002a6c696e676f74656b2e646f6373746f72652e43617373616e647261446f63756d656e74245461726765749d0b9f071f4cb0410200024900076d5f70686173654c00066d5f6c616e677400124c6a6176612f6c616e672f537472696e673b78700000000174000564655f4445
> subcolumn: [ cf="TranslationsByTarget" name="78cfd525-a520-458e-
> 8584-259415b88405"]
> colParent:ColumnParent(column_family:TranslationsByTarget, super_column:0F
> AC ED 00 05 73 72 00 2A 6C 69 6E 67 6F 74 65 6B 2E 64 6F 63 73 74 6F 72 65
> 2E 43 61 73 73 61 6E 64 72 61 44 6F 63 75 6D 65 6E 74 24 54 61 72 67 65 74
> 9D 0B 9F 07 1F 4C B0 41 02 00 02 49 00 07 6D 5F 70 68 61 73 65 4C 00 06 6D
> 5F 6C 61 6E 67 74 00 12 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67
> 3B 78 70 00 00 00 01 74 00 05 64 65 5F 44 45)
> predicate:SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0 lim=17
> cap=17]])
> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01
> 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C, value:80 01
> 00 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00
> 00 05 0C 00 01 0B 00 01 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16
> 7B E4 B2 2C 0B 00 02 00 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE
> 81 EA 85, timestamp:1301329228377))
> col.getName(): 1045d9bce5be02489db22563167be4b22c
> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01
> 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C 0B 00 02 00
> 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA 85 0A 00 03 00 00
> 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10 52 8F EC B9 EE 94
> 43 31 AA AF AD A9 F7 33 DA DA, value:80 01 00 02 00 00 00 09 67 65 74 5F 73
> 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01 00 00 00
> 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C 0B 00 02 00 00 00 11
> 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA 85 0A 00 03 00 00 01 2E FD
> 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10 52 8F EC B9 EE 94 43 31 AA
> AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10..., timestamp:1301329222520))
> col.getName(): 10528fecb9ee944331aaafada9f733dada
> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01
> 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C 0B 00 02 00
> 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA 85 0A 00 03 00 00
> 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10 52 8F EC B9 EE 94
> 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10..., value:80 01 00 02
> 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05
> 0C 00 01 0B 00 01 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4
> B2 2C 0B 00 02 00 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA
> 85 0A 00 03 00 00 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10
> 52 8F EC B9 EE 94 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10...,
> timestamp:1301329262669))
> col.getName(): 10aa47bbbf14f34fd7a99386533b48d274
> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01
> 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C 0B 00 02 00
> 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA 85 0A 00 03 00 00
> 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10 52 8F EC B9 EE 94
> 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10..., value:80 01 00 02
> 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05
> 0C 00 01 0B 00 01 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4
> B2 2C 0B 00 02 00 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA
> 85 0A 00 03 00 00 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10
> 52 8F EC B9 EE 94 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10...,
> timestamp:1301329219744))
> col.getName(): 10c44030c100cc46f5851772d1cb37cf12
> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05 0C 00 01 0B 00 01
> 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4 B2 2C 0B 00 02 00
> 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA 85 0A 00 03 00 00
> 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10 52 8F EC B9 EE 94
> 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10..., value:80 01 00 02
> 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 05
> 0C 00 01 0B 00 01 00 00 00 11 10 45 D9 BC E5 BE 02 48 9D B2 25 63 16 7B E4
> B2 2C 0B 00 02 00 00 00 11 10 29 F8 DC 1D 21 D7 49 DF B6 A6 46 B7 CE 81 EA
> 85 0A 00 03 00 00 01 2E FD 44 32 59 00 00 0C 00 01 0B 00 01 00 00 00 11 10
> 52 8F EC B9 EE 94 43 31 AA AF AD A9 F7 33 DA DA 0B 00 02 00 00 00 11 10...,
> timestamp:1301327602293))
> col.getName(): 1078cfd525a520458e8584259415b88405
>
> The name bytes look good to me...the type byte("10") and then the bytes for
> the UUID.  I looked at the code for Column.getName(), there are some
> utilities methods in thrift sources which returns the byte[] subsequence
> from the buffer's position to the buffer's limit.  I admit that I am still
> learning about the internals of cassandra, but why would the returned
> ByteBuffer contain all this extra data?  Shouldn't there be slice() done
> somewhere, if for no other reason than to reduce the opportunity for buffer
> overflow/underflow?  This sequence "67 65 74 5F 73 6C 69 63 65" ==
> "get_slice" in ascii.  Is the ByteBuffer simply wrapping the response from
> thrift, and leaving the hows and whens of extracting the pertinent bytes to
> the application code?
>
> Abe
>
>
> On Tue, Apr 19, 2011 at 3:00 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> Can you provide a little more info on what I'm seeing here. When name is
>> shown for the column, are you showing me the entire byte buffer for the name
>> or just up to limit ?
>>
>> Aaron
>>
>>
>> On 20 Apr 2011, at 05:49, Abraham Sanderson wrote:
>>
>> Ok, set up a unit test for the supercolumns which seem to have problems, I
>> posted a few examples below.  As I mentioned, the retrieved bytes for the
>> name and value appear to have additional data; in previous tests the
>> buffer's position, mark, and limit have been verified, and when I call
>> column.getName(), just the bytes for the name itself are properly
>> retrieved(if not I should be getting validation errors for the custom uuid
>> types, correct?).
>>
>> Abe Sanderson
>>
>> get_slice for key: 80324d09-302b-4093-9708-e509091e5d8f supercolumn:
>> 0faced00057372002a6c696e676f74656b2e646f6373746f72652e43617373616e647261446f63756d656e74245461726765749d0b9f071f4cb0410200024900076d5f70686173654c00066d5f6c616e677400124c6a6176612f6c616e672f537472696e673b78700000000174000564655f4445
>> subcolumn: [ cf="TranslationsByTarget" name="78cfd525-a520-458e-
>> 8584-259415b88405"]
>> colParent:ColumnParent(column_family:TranslationsByTarget, super_column:0F
>> AC ED 00 05 73 72 00 2A 6C 69 6E 67 6F 74 65 6B 2E 64 6F 63 73 74 6F 72 65
>> 2E 43 61 73 73 61 6E 64 72 61 44 6F 63 75 6D 65 6E 74 24 54 61 72 67 65 74
>> 9D 0B 9F 07 1F 4C B0 41 02 00 02 49 00 07 6D 5F 70 68 61 73 65 4C 00 06 6D
>> 5F 6C 61 6E 67 74 00 12 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67
>> 3B 78 70 00 00 00 01 74 00 05 64 65 5F 44 45)
>> predicate:SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0
>> lim=17 cap=17]])
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 04 0C 00 01 0B 00 01
>> 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF 6F 60 63, value:80 01
>> 00 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00
>> 00 04 0C 00 01 0B 00 01 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1
>> AF 6F 60 63 0B 00 02 00 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41
>> EC 40 7E, timestamp:1301327609539))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 04 0C 00 01 0B 00 01
>> 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF 6F 60 63 0B 00 02 00
>> 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC 40 7E 0A 00 03 00 00
>> 01 2E FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11 10 78 CF D5 25 A5 20
>> 45 8E 85 84 25 94 15 B8 84 05, value:80 01 00 02 00 00 00 09 67 65 74 5F
>> 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 04 0C 00 01 0B 00 01 00 00
>> 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF 6F 60 63 0B 00 02 00 00 00
>> 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC 40 7E 0A 00 03 00 00 01 2E
>> FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11 10 78 CF D5 25 A5 20 45 8E 85
>> 84 25 94 15 B8 84 05 0B 00 02 00 00 00 11 10...,
>> timestamp:1301327602293))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 04 0C 00 01 0B 00 01
>> 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF 6F 60 63 0B 00 02 00
>> 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC 40 7E 0A 00 03 00 00
>> 01 2E FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11 10 78 CF D5 25 A5 20
>> 45 8E 85 84 25 94 15 B8 84 05 0B 00 02 00 00 00 11 10..., value:80 01 00
>> 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00
>> 04 0C 00 01 0B 00 01 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF
>> 6F 60 63 0B 00 02 00 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC
>> 40 7E 0A 00 03 00 00 01 2E FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11
>> 10 78 CF D5 25 A5 20 45 8E 85 84 25 94 15 B8 84 05 0B 00 02 00 00 00 11
>> 10..., timestamp:1301327589704))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 04 0C 00 01 0B 00 01
>> 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF 6F 60 63 0B 00 02 00
>> 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC 40 7E 0A 00 03 00 00
>> 01 2E FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11 10 78 CF D5 25 A5 20
>> 45 8E 85 84 25 94 15 B8 84 05 0B 00 02 00 00 00 11 10..., value:80 01 00
>> 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00
>> 04 0C 00 01 0B 00 01 00 00 00 11 10 49 5D 01 32 73 0D 48 03 85 09 CA F1 AF
>> 6F 60 63 0B 00 02 00 00 00 11 10 FC 0A 0D 43 B1 E0 44 F9 96 AA FC EE 41 EC
>> 40 7E 0A 00 03 00 00 01 2E FD 2B 7E C3 00 00 0C 00 01 0B 00 01 00 00 00 11
>> 10 78 CF D5 25 A5 20 45 8E 85 84 25 94 15 B8 84 05 0B 00 02 00 00 00 11
>> 10..., timestamp:1301327594118))
>>
>>
>> get_slice for key: d1c7f6b9-1425-4fab-b074-5574c54cae08 supercolumn:
>> 0faced00057372002a6c696e676f74656b2e646f6373746f72652e43617373616e647261446f63756d656e74245461726765749d0b9f071f4cb0410200024900076d5f70686173654c00066d5f6c616e677400124c6a6176612f6c616e672f537472696e673b78700000000174000564655f4445
>> subcolumn: [ cf="TranslationsByTarget"
>> name="b2f33b97-69f4-45ec-ad87-dd14ee60d719"]
>> colParent:ColumnParent(column_family:TranslationsByTarget, super_column:0F
>> AC ED 00 05 73 72 00 2A 6C 69 6E 67 6F 74 65 6B 2E 64 6F 63 73 74 6F 72 65
>> 2E 43 61 73 73 61 6E 64 72 61 44 6F 63 75 6D 65 6E 74 24 54 61 72 67 65 74
>> 9D 0B 9F 07 1F 4C B0 41 02 00 02 49 00 07 6D 5F 70 68 61 73 65 4C 00 06 6D
>> 5F 6C 61 6E 67 74 00 12 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67
>> 3B 78 70 00 00 00 01 74 00 05 64 65 5F 44 45)
>> predicate:SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0
>> lim=17 cap=17]])
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 04 0F 00 00 0C 00 00 00 02 0C 00 01 0B 00 01
>> 00 00 00 11 10 7C 2F 5D 5B B3 70 42 E1 A6 A2 77 FC 72 14 40 FE, value:80 01
>> 00 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 04 0F 00 00 0C 00 00
>> 00 02 0C 00 01 0B 00 01 00 00 00 11 10 7C 2F 5D 5B B3 70 42 E1 A6 A2 77 FC
>> 72 14 40 FE 0B 00 02 00 00 00 11 10 B4 64 74 19 F9 44 4E A3 A5 F9 06 32 67
>> DB 33 19, timestamp:1301324860465))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 04 0F 00 00 0C 00 00 00 02 0C 00 01 0B 00 01
>> 00 00 00 11 10 7C 2F 5D 5B B3 70 42 E1 A6 A2 77 FC 72 14 40 FE 0B 00 02 00
>> 00 00 11 10 B4 64 74 19 F9 44 4E A3 A5 F9 06 32 67 DB 33 19 0A 00 03 00 00
>> 01 2E FD 01 8C 31 00 00 0C 00 01 0B 00 01 00 00 00 11 10 B2 F3 3B 97 69
>> F4 45 EC AD 87 DD 14 EE 60 D7 19, value:80 01 00 02 00 00 00 09 67 65 74 5F
>> 73 6C 69 63 65 00 00 00 04 0F 00 00 0C 00 00 00 02 0C 00 01 0B 00 01 00 00
>> 00 11 10 7C 2F 5D 5B B3 70 42 E1 A6 A2 77 FC 72 14 40 FE 0B 00 02 00 00 00
>> 11 10 B4 64 74 19 F9 44 4E A3 A5 F9 06 32 67 DB 33 19 0A 00 03 00 00 01 2E
>> FD 01 8C 31 00 00 0C 00 01 0B 00 01 00 00 00 11 10 B2 F3 3B 97 69 F4 45
>> EC AD 87 DD 14 EE 60 D7 19 0B 00 02 00 00 00 11 10...,
>> timestamp:1301325719735))
>>
>>
>> get_slice for key: 18b4acd1-5491-44d3-aaa1-b725f51d1c3b supercolumn:
>> 0faced00057372002a6c696e676f74656b2e646f6373746f72652e43617373616e647261446f63756d656e74245461726765749d0b9f071f4cb0410200024900076d5f70686173654c00066d5f6c616e677400124c6a6176612f6c616e672f537472696e673b787000000001740005706c5f504c
>> subcolumn: [ cf="TranslationsByTarget"
>> name="3da78c49-a8aa-4fdb-8238-1ade458426b5"]
>> colParent:ColumnParent(column_family:TranslationsByTarget, super_column:0F
>> AC ED 00 05 73 72 00 2A 6C 69 6E 67 6F 74 65 6B 2E 64 6F 63 73 74 6F 72 65
>> 2E 43 61 73 73 61 6E 64 72 61 44 6F 63 75 6D 65 6E 74 24 54 61 72 67 65 74
>> 9D 0B 9F 07 1F 4C B0 41 02 00 02 49 00 07 6D 5F 70 68 61 73 65 4C 00 06 6D
>> 5F 6C 61 6E 67 74 00 12 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67
>> 3B 78 70 00 00 00 01 74 00 05 70 6C 5F 50 4C)
>> predicate:SlicePredicate(column_names:[java.nio.HeapByteBuffer[pos=0
>> lim=17 cap=17]])
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 03 0C 00 01 0B 00 01
>> 00 00 00 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3 77 AF 2E BD, value:80 01
>> 00 02 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00
>> 00 03 0C 00 01 0B 00 01 00 00 00 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3
>> 77 AF 2E BD 0B 00 02 00 00 00 11 10 62 58 73 23 CB 37 4F B5 BD DD BC F5 1E
>> 7F E7 65, timestamp:1301000346861))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 03 0C 00 01 0B 00 01
>> 00 00 00 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3 77 AF 2E BD 0B 00 02 00
>> 00 00 11 10 62 58 73 23 CB 37 4F B5 BD DD BC F5 1E 7F E7 65 0A 00 03 00 00
>> 01 2E E9 A9 DC ED 00 00 0C 00 01 0B 00 01 00 00 00 11 10 3D A7 8C 49 A8 AA
>> 4F DB 82 38 1A DE 45 84 26 B5, value:80 01 00 02 00 00 00 09 67 65 74 5F 73
>> 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 03 0C 00 01 0B 00 01 00 00 00
>> 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3 77 AF 2E BD 0B 00 02 00 00 00 11
>> 10 62 58 73 23 CB 37 4F B5 BD DD BC F5 1E 7F E7 65 0A 00 03 00 00 01 2E E9
>> A9 DC ED 00 00 0C 00 01 0B 00 01 00 00 00 11 10 3D A7 8C 49 A8 AA 4F DB 82
>> 38 1A DE 45 84 26 B5 0B 00 02 00 00 00 11 10..., timestamp:1301000346885))
>> col: ColumnOrSuperColumn(column:Column(name:80 01 00 02 00 00 00 09 67 65
>> 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 03 0C 00 01 0B 00 01
>> 00 00 00 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3 77 AF 2E BD 0B 00 02 00
>> 00 00 11 10 62 58 73 23 CB 37 4F B5 BD DD BC F5 1E 7F E7 65 0A 00 03 00 00
>> 01 2E E9 A9 DC ED 00 00 0C 00 01 0B 00 01 00 00 00 11 10 3D A7 8C 49 A8 AA
>> 4F DB 82 38 1A DE 45 84 26 B5 0B 00 02 00 00 00 11 10..., value:80 01 00 02
>> 00 00 00 09 67 65 74 5F 73 6C 69 63 65 00 00 00 02 0F 00 00 0C 00 00 00 03
>> 0C 00 01 0B 00 01 00 00 00 11 10 24 D4 2C 7F 2D C3 4A 80 B3 FF 5B A3 77 AF
>> 2E BD 0B 00 02 00 00 00 11 10 62 58 73 23 CB 37 4F B5 BD DD BC F5 1E 7F E7
>> 65 0A 00 03 00 00 01 2E E9 A9 DC ED 00 00 0C 00 01 0B 00 01 00 00 00 11 10
>> 3D A7 8C 49 A8 AA 4F DB 82 38 1A DE 45 84 26 B5 0B 00 02 00 00 00 11 10...,
>> timestamp:1301000346836))
>>
>> On Mon, Apr 18, 2011 at 5:41 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>
>>> Can you could provide an example of a get_slice request that failed and
>>> the columns that were returned, so we can see the actual bytes for the super
>>> column and column names.
>>>
>>> Aaron
>>>
>>>
>>> On 19 Apr 2011, at 09:26, Abraham Sanderson wrote:
>>>
>>> I wish it were consistent enough that the answer were simple...  It
>>> varies between just the requested subcolumn to all subcolumns.  It always
>>> does return the columns in order, and the requested column is always one of
>>> the columns returned.   However, the slice start is not consistently in the
>>> same place(like n+1 or n-1).  For example, if I have CF['key']['supercolumn'
>>> ['a','b','c','d','e']], and query for 'c', sometimes i get a slice with 'a',
>>> 'b', 'c', other times its 'b', 'c', 'd', sometimes 'c', 'd'.  When the
>>> column name is closer to the end of the range('d' or 'e'), sometimes it
>>> justs a slice with the column.  The sporadic behavior makes me think that
>>> it's a race condition, but the behavior linked to the column range makes we
>>> think I'm overrunning the buffer somewhere.  I at first suspected that I was
>>> inadvertently making modifications to the buffers in application code during
>>> serialization/deserialization, so I did the tests in the cli.  This limits
>>> it to just cassandra/thrift code and my custom types.  Am I missing some
>>> other factor?  While debugging I have noticed that the byte buffers contain
>>> more than they used to; it looks to me like tokens that contain parts of the
>>> thrift response.  I'd see strings like
>>> "???get_slice???Foo??7c2f5d5b-b370-42e1-a6a2-77fc721440fe????"  Is it
>>> possible that I am inadvertently using a reserved token or something on my
>>> supercolumn name and this is screwing with the slice command?
>>>
>>> Abe
>>>
>>> On Mon, Apr 18, 2011 at 2:55 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>>
>>>> When you run the get_slice which columns are returned ?
>>>>
>>>>
>>>> Aaron
>>>>
>>>> On 19 Apr 2011, at 04:12, Abraham Sanderson wrote:
>>>>
>>>> Ok, I made the changes and tried again.  Here is the before modifying my
>>>> method using a simple get, confirmed the same output in the cli:
>>>>
>>>> DEBUG [pool-1-thread-2] 2011-04-18 09:37:23,910 CassandraServer.java
>>>> (line 279) get
>>>> DEBUG [pool-1-thread-2] 2011-04-18 09:37:23,911 StorageProxy.java (line
>>>> 322) Command/ConsistencyLevel is SliceByNamesReadCommand(table='DocStore',
>>>> key=64316337663662392d313432352d346661622d623037342d353537346335346361653038,
>>>> columnParent='QueryPath(columnFamilyName='Tran
>>>> slationsByTarget', superColumnName='java.nio.HeapByteBuffer[pos=95
>>>> lim=211 cap=244]', columnName='null')',
>>>> columns=[7c2f5d5b-b370-42e1-a6a2-77fc721440fe,])/ALL
>>>> DEBUG [pool-1-thread-2] 2011-04-18 09:37:23,911 ReadCallback.java (line
>>>> 84) Blockfor/repair is 1/true; setting up requests to localhost/
>>>> 127.0.0.1
>>>> DEBUG [pool-1-thread-2] 2011-04-18 09:37:23,911 StorageProxy.java (line
>>>> 345) reading data locally
>>>> DEBUG [ReadStage:4] 2011-04-18 09:37:23,911 StorageProxy.java (line 450)
>>>> LocalReadRunnable reading SliceByNamesReadCommand(table='DocStore',
>>>> key=64316337663662392d313432352d346661622d623037342d353537346335346361653038,
>>>> columnParent='QueryPath(columnFamilyName='Translatio
>>>> nsByTarget', superColumnName='java.nio.HeapByteBuffer[pos=95 lim=211
>>>> cap=244]', columnName='null')',
>>>> columns=[7c2f5d5b-b370-42e1-a6a2-77fc721440fe,])
>>>> DEBUG [pool-1-thread-2] 2011-04-18 09:37:23,912 StorageProxy.java (line
>>>> 395) Read: 1 ms.
>>>> ERROR [pool-1-thread-2] 2011-04-18 09:37:23,912 Cassandra.java (line
>>>> 2665) Internal error processing get
>>>> java.lang.AssertionError
>>>>         at
>>>> org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:300)
>>>>         at
>>>> org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
>>>>         at
>>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>>>>         at
>>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>         at java.lang.Thread.run(Thread.java:636)
>>>>
>>>> And here is the after...it succeeds here but still gives me multiple
>>>> subcolumns in the response.  Same behavior, it seems, I'm just sidestepping
>>>> the original AssertionError:
>>>>
>>>> DEBUG [pool-1-thread-6] 2011-04-18 09:50:26,617 CassandraServer.java
>>>> (line 232) get_slice
>>>> DEBUG [pool-1-thread-6] 2011-04-18 09:50:26,617 StorageProxy.java (line
>>>> 322) Command/ConsistencyLevel is SliceByNamesReadCommand(table='DocStore',
>>>> key=64316337663662392d313432352d346661622d623037342d353537346335346361653038,
>>>> columnParent='QueryPath(columnFamilyName='TranslationsByTarget',
>>>> superColumnName='java.nio.HeapByteBuffer[pos=101 lim=217 cap=259]',
>>>> columnName='null')', columns=[7c2f5d5b-b370-42e1-a6a2-77fc721440fe,])/ALL
>>>> DEBUG [pool-1-thread-6] 2011-04-18 09:50:26,617 ReadCallback.java (line
>>>> 84) Blockfor/repair is 1/true; setting up requests to localhost/
>>>> 127.0.0.1
>>>> DEBUG [pool-1-thread-6] 2011-04-18 09:50:26,617 StorageProxy.java (line
>>>> 345) reading data locally
>>>> DEBUG [ReadStage:3] 2011-04-18 09:50:26,618 StorageProxy.java (line 450)
>>>> LocalReadRunnable reading SliceByNamesReadCommand(table='DocStore',
>>>> key=64316337663662392d313432352d346661622d623037342d353537346335346361653038,
>>>> columnParent='QueryPath(columnFamilyName='TranslationsByTarget',
>>>> superColumnName='java.nio.HeapByteBuffer[pos=101 lim=217 cap=259]',
>>>> columnName='null')', columns=[7c2f5d5b-b370-42e1-a6a2-77fc721440fe,])
>>>> DEBUG [pool-1-thread-6] 2011-04-18 09:50:26,618 StorageProxy.java (line
>>>> 395) Read: 0 ms.
>>>>
>>>> My comparators are relatively simple.  Basically I have a schema that
>>>> required heterogenous columns, but I needed to be able to deserialize them
>>>> in unique ways.  So there is always a type byte that precedes the bytes of
>>>> the data.  The supercolumn in this case is a general data type, which
>>>> happens to represent a serializable object:
>>>>
>>>>   public void validate(ByteBuffer bytes)
>>>>     throws MarshalException
>>>>   {
>>>>     if(bytes.remaining() == 0)
>>>>       return;
>>>>
>>>>     validateDataType(bytes.get(bytes.position()));
>>>>     return;
>>>>   }
>>>>
>>>>   public int compare(ByteBuffer bytes1, ByteBuffer bytes2)
>>>>   {
>>>>     if (bytes1.remaining() == 0)
>>>>       return bytes2.remaining() == 0 ? 0 : -1;
>>>>     else if (bytes2.remaining() == 0)
>>>>       return 1;
>>>>     else
>>>>     {
>>>>       // compare type
>>>> bytes
>>>>
>>>>       byte T1 = bytes1.get(bytes1.position());
>>>>       byte T2 = bytes2.get(bytes2.position());
>>>>       if (T1 != T2)
>>>>         return (T1 - T2);
>>>>
>>>>       // compare
>>>> values
>>>>
>>>>       return ByteBufferUtil.compareUnsigned(bytes1, bytes2);
>>>>     }
>>>>   }
>>>>
>>>> The subcolumn is similar...just a UUID with a type byte prefix:
>>>>
>>>>   public void validate(ByteBuffer bytes)
>>>>     throws MarshalException
>>>>   {
>>>>     if(bytes.remaining() == 0)
>>>>       return;
>>>>
>>>>     validateDataType(bytes.get(bytes.position()));
>>>>     if((bytes.remaining() - 1) == 0)
>>>>       return;
>>>>     else if((bytes.remaining() - 1) != 16)
>>>>       throw new MarshalException("UUID value must be exactly 16 bytes");
>>>>   }
>>>>
>>>>   public int compare(ByteBuffer bytes1, ByteBuffer bytes2)
>>>>   {
>>>>     if (bytes1.remaining() == 0)
>>>>       return bytes2.remaining() == 0 ? 0 : -1;
>>>>     else if (bytes2.remaining() == 0)
>>>>       return 1;
>>>>     else
>>>>     {
>>>>       // compare type
>>>> bytes
>>>>
>>>>       byte T1 = bytes1.get(bytes1.position());
>>>>       byte T2 = bytes2.get(bytes2.position());
>>>>       if (T1 != T2)
>>>>         return (T1 - T2);
>>>>
>>>>       // compare
>>>> values
>>>>
>>>>       UUID U1 = getUUID(bytes1, bytes1.position()+1);
>>>>       UUID U2 = getUUID(bytes2, bytes2.position()+1);
>>>>       return U1.compareTo(U2);
>>>>     }
>>>>   }
>>>>
>>>>   static UUID getUUID(ByteBuffer bytes, int pos)
>>>>   {
>>>>     long msBits = bytes.getLong(pos);
>>>>     long lsBits = bytes.getLong(pos+8);
>>>>     return new UUID(msBits, lsBits);
>>>>   }
>>>>
>>>> All of my buffer reads are done by index, the position shouldn't be
>>>> changing at all.
>>>>
>>>> Abe Sanderson
>>>>
>>>> On Sat, Apr 16, 2011 at 5:38 PM, aaron morton <aaron@thelastpickle.com>wrote:
>>>>
>>>>> Can you run the same request as a get_slice naming the column in the
>>>>> SlicePredicate and see what comes back ?
>>>>>
>>>>> Can you reproduce the fault with logging set at DEBUG and send the logs
>>>>> ?
>>>>>
>>>>> Also, whats the compare function like for your custom type ?
>>>>>
>>>>> Cheers
>>>>> Aaron
>>>>>
>>>>>
>>>>> On 16 Apr 2011, at 07:34, Abraham Sanderson wrote:
>>>>>
>>>>> > I'm having some issues with a few of my ColumnFamilies after a
>>>>> cassandra upgrade/import from 0.6.1 to 0.7.4.  I followed the instructions
>>>>> to upgrade and everything seem to work OK...until I got into the application
>>>>> and noticed some wierd behavior.  I was getting the following stacktrace
in
>>>>> cassandra occassionally when I did get operations for a single subcolumn
for
>>>>> some of the Super type CFs:
>>>>> >
>>>>> > ERROR 12:56:05,669 Internal error processing get
>>>>> > java.lang.AssertionError
>>>>> >         at org.apache.cassandra.thrift.
>>>>> > CassandraServer.get(CassandraServer.java:300)
>>>>> >         at
>>>>> org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
>>>>> >         at
>>>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>>>>> >         at
>>>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
>>>>> >         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> >         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> >         at java.lang.Thread.run(Thread.java:636)
>>>>> >
>>>>> > The assertion that is failing is the check that only one column
is
>>>>> retrieved by the get.  I did some debugging with the cli and a remote
>>>>>  debugger and found a few interesting patterns.  First, the problem does
not
>>>>> seem consistently duplicatable.  If one supercolumn is affected though,
it
>>>>> will happen more frequently for subcolumns that when sorted appear at
the
>>>>> beginning of the range.  For columns near the end of the range, it seems
to
>>>>> be more intermittent, and almost never occurs when I step through the
code
>>>>> line by line.  The only factor I can think of that might cause issues
is
>>>>> that I am using custom data types for all supercolumns and columns. 
I
>>>>> originally thought I might be reading past the end of the ByteBuffer,
but I
>>>>> have quadrupled checked that this is not the case.
>>>>> >
>>>>> > Abe Sanderson
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Mime
View raw message