incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Michalski <mich...@opera.com>
Subject Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )
Date Fri, 24 May 2013 10:25:25 GMT
Hmmm... In general it seems that for some reason Cassandra reads invalid 
value when trying to get key length (it should be ~100-150, but it gets 
2048), then basing on this value it reads too much data and when trying 
to read next key's length again it reads some garbage translating it to 
a gargantuan int, that has nothing to do with real data layout in file, 
so in the end it crashes.

I paste here some more logs (it crashes in a bit different place in code 
when streaming only one file, than it happened on production, but due to 
steps I have to take to reproduce it, I'm sure it's for the same root 
cause). I've modified code a bit to print debug messages in a few more 
places and I print the "real" key in DecoratedKey:

----- HERE STARTS THE LAST VALID KEY/VALUE PAIR
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 370) 
readShortLength->readUnsignedShort: 121
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 407) 
READ with length 121
  INFO [Thread-8] 2013-05-23 13:07:06,766 IncomingStreamReader.java 
(line 165) DK: DecoratedKey(5664330507961197044404922676062547179, 
*[VALID HEX STRING]*, java.nio.HeapByteBuffer[pos=0 lim=121 cap=121], 
key: 0,daily,20130325,1,domain,uniqueusers,<the rest of the key I can't 
show - total lenght is 121 as expected and it's 100% valid>)
  INFO [Thread-8] 2013-05-23 13:07:06,766 IncomingStreamReader.java 
(line 169) dataSize: 56
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 370) 
readShortLength->readUnsignedShort: 8
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 407) 
READ with length 8
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 356) 
readWithLength: 9
  INFO [Thread-8] 2013-05-23 13:07:06,766 ByteBufferUtil.java (line 407) 
READ with length 9

----- HERE STARTS THE KEY/VALUE PAIR THAT FAILS
  INFO [Thread-8] 2013-05-23 13:07:06,767 ByteBufferUtil.java (line 370) 
readShortLength->readUnsignedShort: 2048    *<-- 100% wrong, there's no 
such long key in this file!*
  INFO [Thread-8] 2013-05-23 13:07:06,767 ByteBufferUtil.java (line 407) 
READ with length 2048
  INFO [Thread-8] 2013-05-23 13:07:06,768 IncomingStreamReader.java 
(line 165) DK: DecoratedKey(72742576182685307053268326336110565659, 
*[2048-bytes HEX string]*, java.nio.HeapByteBuffer[pos=0 lim=2048 
cap=2048], key: *[HERE'S THE LONG CHUNK - 2048 bytes - OF GARBAGE DATA! 
I can see a few keys & columns here + some strange characters that 
describe the data - key lenghts etc.]*)
  INFO [Thread-60] 2013-05-24 12:08:06,756 IncomingStreamReader.java 
(line 169) dataSize: 3831491449718795617    *<-- WOOT?! U MAD?!*
  INFO [Thread-60] 2013-05-24 12:08:07,116 ByteBufferUtil.java (line 
370) readShortLength->readUnsignedShort: 14938
  INFO [Thread-60] 2013-05-24 12:08:07,116 ByteBufferUtil.java (line 
407) READ with length 14938
  INFO [Thread-60] 2013-05-24 12:08:07,116 ByteBufferUtil.java (line 
356) readWithLength: 131080
  INFO [Thread-60] 2013-05-24 12:08:07,117 ByteBufferUtil.java (line 
407) READ with length 131080
  INFO [Thread-60] 2013-05-24 12:08:07,121 ByteBufferUtil.java (line 
370) readShortLength->readUnsignedShort: 27749
  INFO [Thread-60] 2013-05-24 12:08:07,121 ByteBufferUtil.java (line 
407) READ with length 27749
  INFO [Thread-60] 2013-05-24 12:08:07,121 ByteBufferUtil.java (line 
356) readWithLength: 81622009
  INFO [Thread-60] 2013-05-24 12:08:07,186 CompressedInputStream.java 
(line 94) CompressedInputStream->read: 6405671, 6291456, 114215
  INFO [Thread-60] 2013-05-24 12:08:07,186 CompressedInputStream.java 
(line 95) CompressedInputStream compare: 6405671 >= 6291456 && 6405671 < 
6405671  *<-- The assertion that fails*
ERROR [Thread-60] 2013-05-24 12:08:07,221 CassandraDaemon.java (line 
133) Exception in thread Thread[Thread-60,5,main]
java.lang.AssertionError
	at 
org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:98)
	at java.io.InputStream.read(InputStream.java:163)
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at 
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:94)
	at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:405)
	at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
	at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108)
	at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92)
	at 
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:249)
	at 
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:187)
	at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
	at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
	at 
org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
	at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


It gets more interesting... ;-)

M.

W dniu 24.05.2013 10:46, Michal Michalski pisze:
>> Sounds like a nasty heisenbug, can you replace or rebuild the machine?
>
> Heisenbug :D
> (never heard this name before :-) )
>
> I thought so too, but I finally managed to reproduce it locally (it
> requires 3 nodes, one of them needs to have a specific token assigned),
> the rest just have to be present in cluster; plus the special, "invalid"
> file too of course), so it's probably token/partitioner related, but for
> now I have no bloody idea how. Thus, I'm rather sure that
> rebuilding/replacing node will not work :-(
>
> I also thought it might be a very weird DeflateCompressor bug causing it
> to produce invalid output for some strange characters (not very
> possible, but I had no better idea ;-) ). However, I tested
> compression/decompression and it's fine.
>
> D'oh!
>
> M.
>
>>
>>   Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/05/2013, at 9:36 PM, Michal Michalski <michalm@opera.com> wrote:
>>
>>> I've finally had some time to experiment a bit with this problem (it
>>> occured twice again) and here's what I found:
>>>
>>> 1. So far (three occurences in total), *when* it happened, it
>>> happened only for streaming to  *one* specific C* node (but it works
>>> on this node too for 99,9% of the time)
>>> 2. It happens with compression turned on
>>> (cassandra.output.compression.class set to
>>> org.apache.cassandra.io.compress.DeflateCompressor, but it doesn't
>>> matter what the chunk length is)
>>> 3. Everything works fine when compression is turned off.
>>>
>>> So it looks like I have a workaround for now, but I don't really
>>> understand the root cause of this problem and what's the "right"
>>> solution if we want to keep using compression.
>>>
>>> Anyway, the thing that interests me the most is why does it fail so
>>> rarely and - assuming it's not a coincidence - why only for one C*
>>> node...
>>>
>>> May it be a DeflateCompressor's bug?
>>> Any other ideas?
>>>
>>> Regards,
>>> Michał
>>>
>>>
>>> W dniu 31.03.2013 12:01, aaron morton pisze:
>>>>>   but yesterday one of 600 mappers failed
>>>>>
>>>> :)
>>>>
>>>>>  From what I can understand by looking into the C* source, it seems
>>>>> to me that the problem is caused by a empty (or surprisingly
>>>>> finished?) input buffer (?) causing token to be set to -1 which is
>>>>> improper for RandomPartitioner:
>>>> Yes, there is a zero length key which as a -1 token.
>>>>
>>>>> However, I can't figure out what's the root cause of this problem.
>>>>> Any ideas?
>>>> mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and
>>>> neither of them check for zero length row keys. I would look there
>>>> first.
>>>>
>>>> There is no validation in the  AbstractSSTableSimpleWriter, not sure
>>>> if that is by design or an oversight. Can you catch the zero length
>>>> key in your map job ?
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 28/03/2013, at 2:26 PM, Michal Michalski <michalm@opera.com> wrote:
>>>>
>>>>> We're streaming data to Cassandra directly from MapReduce job using
>>>>> BulkOutputFormat. It's been working for more than a year without
>>>>> any problems, but yesterday one of 600 mappers faild and we got a
>>>>> strange-looking exception on one of the C* nodes.
>>>>>
>>>>> IMPORTANT: It happens on one node and on one cluster only. We've
>>>>> loaded the same data to test cluster and it worked.
>>>>>
>>>>>
>>>>> ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java
>>>>> (line 133) Exception in thread Thread[Thread-1340977,5,main]
>>>>> java.lang.RuntimeException: Last written key
>>>>> DecoratedKey(5664330507961197044404922676062547179,
>>>>> 302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133)
>>>>> >= current key DecoratedKey(-1, ) writing into
>>>>> /cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>>>>>
>>>>>     at
>>>>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>>>>
>>>>>
>>>>>
>>>>>  From what I can understand by looking into the C* source, it seems
>>>>> to me that the problem is caused by a empty (or surprisingly
>>>>> finished?) input buffer (?) causing token to be set to -1 which is
>>>>> improper for RandomPartitioner:
>>>>>
>>>>> public BigIntegerToken getToken(ByteBuffer key)
>>>>> {
>>>>>     if (key.remaining() == 0)
>>>>>         return MINIMUM;        // Which is -1
>>>>>     return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
>>>>> }
>>>>>
>>>>> However, I can't figure out what's the root cause of this problem.
>>>>> Any ideas?
>>>>>
>>>>> Of course I can't exclude a bug in my code which streams these
>>>>> data, but - as I said - it works when loading the same data to test
>>>>> cluster (which has different number of nodes, thus different token
>>>>> assignment, which might be a case too).
>>>>>
>>>>> Michał
>>>>
>>>>
>>>
>>
>>


Mime
View raw message