cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8682) BulkRecordWriter ends up streaming with non-unique session IDs on large hadoop cluster
Date Thu, 12 Nov 2015 16:10:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-8682:
-----------------------------------------
    Fix Version/s: 3.x

> BulkRecordWriter ends up streaming with non-unique session IDs on large hadoop cluster
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8682
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8682
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Erik Forsberg
>             Fix For: 3.x
>
>         Attachments: cassandra-1.2-bulkrecordwriter-sessionid.patch
>
>
> We use BulkOutputFormat extensively to load data from hadoop to Cassandra. We are currently
running Cassandra 1.2.18, but are planning an upgrade of Cassandra to 2.0.X, possibly 2.1.X.
> With Cassandra 1.2 we have problems with the streaming session IDs getting duplicated
when multiple (20+) java processes start to do streaming at the same time. On the receiving
cassandra node, having the same session ID actually correspond to different sending processing
would confuse things a lot, leading to aborted connections. 
> This would not happen for every process, but often enough to be a problem in production
environment. So it was a bit tricky to test.
> Suspecting this have to do with how UUIDs are generated on the sending (hadoop side).
With 20+ processes being started concurrently, the clockSeqAndNode part of the uuid1 probably
ended up being exactly the same on all 20 processes. 
> I wrote a patch which I unfortunately never submitted at the time, but it's attached
to this issue. The patch constructs a UUID from the map or reduce task ID, which is guaranteed
to be unique per hadoop cluster.
> I suspect we're going to face the same issue on Cassandra 2.0 and 2.1, even after the
rewrite of the streaming subsystem. Please correct me if I'm wrong, i.e. if there's something
in the new code that will make this a non-issue.
> Now the question is how to address this problem. Possible options that I see after some
code reading:
> 1. Update patch to apply on 2.0 and 2.1, using same method (generating UUID from hadoop
task ID)
> 2. Modify UUIDGen code to use java process pid as clockSeq instead of random number.
However, getting the pid in java seems less than simple (and remember that this is code that
runs on the hadoop size of things, not inside cassandra daemon)
> 3. This patch might help:
> {noformat}
> diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java b/src/java/org/apache/cassandra/utils/UUIDGen.java
> index f385744..ae253ab 100644
> --- a/src/java/org/apache/cassandra/utils/UUIDGen.java
> +++ b/src/java/org/apache/cassandra/utils/UUIDGen.java
> @@ -234,7 +234,7 @@ public class UUIDGen
>  
>      private static long makeClockSeqAndNode()
>      {
> -        long clock = new Random(System.currentTimeMillis()).nextLong();
> +        long clock = new Random().nextLong();
>  
>          long lsb = 0;
>          lsb |= 0x8000000000000000L;                 // variant (2 bits)
> {noformat}
> ..but I don't know the reason System.currentTimeMillis() is being used.
> Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message