Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F999985F for ; Mon, 7 May 2012 21:51:12 +0000 (UTC) Received: (qmail 87767 invoked by uid 500); 7 May 2012 21:51:12 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 87733 invoked by uid 500); 7 May 2012 21:51:12 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 87724 invoked by uid 99); 7 May 2012 21:51:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 21:51:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 21:51:10 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 757A943855E for ; Mon, 7 May 2012 21:50:50 +0000 (UTC) Date: Mon, 7 May 2012 21:50:50 +0000 (UTC) From: "Aaron Morton (JIRA)" To: commits@cassandra.apache.org Message-ID: <1966958562.36637.1336427450482.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1135125940.32608.1336345848014.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (CASSANDRA-4223) Non Unique Streaming session ID's MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-4223: ------------------------------------ Attachment: NanoTest.java Test for unique nanoTime() results. > Non Unique Streaming session ID's > --------------------------------- > > Key: CASSANDRA-4223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4223 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.9 > Environment: Ubuntu 10.04.2 LTS > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > "Bare metal" servers from https://www.stormondemand.com/servers/baremetal.html > The servers run on a custom hypervisor. > > Reporter: Aaron Morton > Assignee: Aaron Morton > Labels: datastax_qa > Attachments: NanoTest.java, fmm streaming bug.txt > > > I have observed repair processes failing due to duplicate Streaming session ID's. In this installation it is preventing rebalance from completing. I believe it has also prevented repair from completing in the past. > The attached streaming-logs.txt file contains log messages and an explanation of what was happening during a repair operation. it has the evidence for duplicate session ID's. > The duplicate session id's were generated on the repairing node and sent to the streaming node. The streaming source replaced the first session with the second which resulted in both sessions failing when the first FILE_COMPLETE message was received. > The errors were: > {code:java} > DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line 47) Received StreamReply StreamReply(sessionId=26132848816442266, file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', action=FILE_FINISHED) > ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main] > java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null > at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > and > {code:java} > DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line 47) Received StreamReply StreamReply(sessionId=26132848816442266, file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', action=FILE_FINISHED) > ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:2,5,main] > java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null > at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195) > at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58) > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} > I think this is because System.nanoTime() is used for the session ID when creating the StreamInSession objects (driven from StorageService.requestRanges()) . > From the documentation (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) > {quote} > This method provides nanosecond precision, but not necessarily nanosecond accuracy. No guarantees are made about how frequently values change. > {quote} > Also some info here on clocks and timers https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks > The hypervisor may be at fault here. But it seems like we cannot rely on successive calls to nanoTime() to return different values. > To avoid message/interface changes on the StreamHeader it would be good to keep the session ID a long. The simplest approach may be to make successive calls to nanoTime until the result changes. We could fail if a certain number of milliseconds have passed. > Hashing the file names and ranges is also a possibility, but more involved. > (We may also want to drop latency times that are 0 nano seconds.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira