incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Michalski <mich...@opera.com>
Subject Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )
Date Fri, 24 May 2013 08:46:17 GMT
> Sounds like a nasty heisenbug, can you replace or rebuild the machine?

Heisenbug :D
(never heard this name before :-) )

I thought so too, but I finally managed to reproduce it locally (it 
requires 3 nodes, one of them needs to have a specific token assigned), 
the rest just have to be present in cluster; plus the special, "invalid" 
file too of course), so it's probably token/partitioner related, but for 
now I have no bloody idea how. Thus, I'm rather sure that 
rebuilding/replacing node will not work :-(

I also thought it might be a very weird DeflateCompressor bug causing it 
to produce invalid output for some strange characters (not very 
possible, but I had no better idea ;-) ). However, I tested 
compression/decompression and it's fine.

D'oh!

M.

>
>   Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/05/2013, at 9:36 PM, Michal Michalski <michalm@opera.com> wrote:
>
>> I've finally had some time to experiment a bit with this problem (it occured twice
again) and here's what I found:
>>
>> 1. So far (three occurences in total), *when* it happened, it happened only for streaming
to  *one* specific C* node (but it works on this node too for 99,9% of the time)
>> 2. It happens with compression turned on (cassandra.output.compression.class set
to org.apache.cassandra.io.compress.DeflateCompressor, but it doesn't matter what the chunk
length is)
>> 3. Everything works fine when compression is turned off.
>>
>> So it looks like I have a workaround for now, but I don't really understand the root
cause of this problem and what's the "right" solution if we want to keep using compression.
>>
>> Anyway, the thing that interests me the most is why does it fail so rarely and -
assuming it's not a coincidence - why only for one C* node...
>>
>> May it be a DeflateCompressor's bug?
>> Any other ideas?
>>
>> Regards,
>> Michał
>>
>>
>> W dniu 31.03.2013 12:01, aaron morton pisze:
>>>>   but yesterday one of 600 mappers failed
>>>>
>>> :)
>>>
>>>>  From what I can understand by looking into the C* source, it seems to me
that the problem is caused by a empty (or surprisingly finished?) input buffer (?) causing
token to be set to -1 which is improper for RandomPartitioner:
>>> Yes, there is a zero length key which as a -1 token.
>>>
>>>> However, I can't figure out what's the root cause of this problem.
>>>> Any ideas?
>>> mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and neither of them
check for zero length row keys. I would look there first.
>>>
>>> There is no validation in the  AbstractSSTableSimpleWriter, not sure if that
is by design or an oversight. Can you catch the zero length key in your map job ?
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 28/03/2013, at 2:26 PM, Michal Michalski <michalm@opera.com> wrote:
>>>
>>>> We're streaming data to Cassandra directly from MapReduce job using BulkOutputFormat.
It's been working for more than a year without any problems, but yesterday one of 600 mappers
faild and we got a strange-looking exception on one of the C* nodes.
>>>>
>>>> IMPORTANT: It happens on one node and on one cluster only. We've loaded the
same data to test cluster and it worked.
>>>>
>>>>
>>>> ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java (line
133) Exception in thread Thread[Thread-1340977,5,main]
>>>> java.lang.RuntimeException: Last written key DecoratedKey(5664330507961197044404922676062547179,
302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133)
>= current key DecoratedKey(-1, ) writing into /cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db
>>>> 	at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>>>> 	at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209)
>>>> 	at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
>>>> 	at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>>>> 	at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>>>> 	at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>>>> 	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>>>
>>>>
>>>>  From what I can understand by looking into the C* source, it seems to me
that the problem is caused by a empty (or surprisingly finished?) input buffer (?) causing
token to be set to -1 which is improper for RandomPartitioner:
>>>>
>>>> public BigIntegerToken getToken(ByteBuffer key)
>>>> {
>>>>     if (key.remaining() == 0)
>>>>         return MINIMUM;		// Which is -1
>>>>     return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
>>>> }
>>>>
>>>> However, I can't figure out what's the root cause of this problem.
>>>> Any ideas?
>>>>
>>>> Of course I can't exclude a bug in my code which streams these data, but
- as I said - it works when loading the same data to test cluster (which has different number
of nodes, thus different token assignment, which might be a case too).
>>>>
>>>> Michał
>>>
>>>
>>
>
>


Mime
View raw message