incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )
Date Sun, 31 Mar 2013 10:01:58 GMT
>  but yesterday one of 600 mappers failed
>  
:)

> From what I can understand by looking into the C* source, it seems to me that the problem
is caused by a empty (or surprisingly finished?) input buffer (?) causing token to be set
to -1 which is improper for RandomPartitioner:
Yes, there is a zero length key which as a -1 token. 

> However, I can't figure out what's the root cause of this problem.
> Any ideas?
mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and neither of them check for
zero length row keys. I would look there first. 

There is no validation in the  AbstractSSTableSimpleWriter, not sure if that is by design
or an oversight. Can you catch the zero length key in your map job ? 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 2:26 PM, Michal Michalski <michalm@opera.com> wrote:

> We're streaming data to Cassandra directly from MapReduce job using BulkOutputFormat.
It's been working for more than a year without any problems, but yesterday one of 600 mappers
faild and we got a strange-looking exception on one of the C* nodes.
> 
> IMPORTANT: It happens on one node and on one cluster only. We've loaded the same data
to test cluster and it worked.
> 
> 
> ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java (line 133) Exception
in thread Thread[Thread-1340977,5,main]
> java.lang.RuntimeException: Last written key DecoratedKey(5664330507961197044404922676062547179,
302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133)
>= current key DecoratedKey(-1, ) writing into /cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db
> 	at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
> 	at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209)
> 	at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
> 	at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
> 	at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
> 	at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
> 	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
> 
> From what I can understand by looking into the C* source, it seems to me that the problem
is caused by a empty (or surprisingly finished?) input buffer (?) causing token to be set
to -1 which is improper for RandomPartitioner:
> 
> public BigIntegerToken getToken(ByteBuffer key)
> {
>    if (key.remaining() == 0)
>        return MINIMUM;		// Which is -1
>    return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
> }
> 
> However, I can't figure out what's the root cause of this problem.
> Any ideas?
> 
> Of course I can't exclude a bug in my code which streams these data, but - as I said
- it works when loading the same data to test cluster (which has different number of nodes,
thus different token assignment, which might be a case too).
> 
> MichaƂ


Mime
View raw message