Return-Path: X-Original-To: apmail-activemq-dev-archive@www.apache.org Delivered-To: apmail-activemq-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D59BD185 for ; Fri, 23 Nov 2012 10:33:03 +0000 (UTC) Received: (qmail 54182 invoked by uid 500); 23 Nov 2012 10:33:02 -0000 Delivered-To: apmail-activemq-dev-archive@activemq.apache.org Received: (qmail 54110 invoked by uid 500); 23 Nov 2012 10:33:02 -0000 Mailing-List: contact dev-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@activemq.apache.org Delivered-To: mailing list dev@activemq.apache.org Received: (qmail 54081 invoked by uid 99); 23 Nov 2012 10:33:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2012 10:33:01 +0000 Date: Fri, 23 Nov 2012 10:33:01 +0000 (UTC) From: "Matthias Vetter (JIRA)" To: dev@activemq.apache.org Message-ID: <1640026832.18958.1353666781645.JavaMail.jiratomcat@arcas> In-Reply-To: <1819821430.99777.1352714051445.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (AMQ-4171) KahaDB / Journal rotateWriteFIle() sometimes too fast? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMQ-4171?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D135031= 46#comment-13503146 ]=20 Matthias Vetter commented on AMQ-4171: -------------------------------------- After deeper examination of the above described failure, we still have no s= olution the problem: It is always possible to create the first file db-1.log. But the creation o= f its successor always fails, always with the same exception: file not foun= d, the system cannot find the path specified.=20 The DataFileAppender gets the next DataFile-object (from the method we susp= ected to be too fast) and then this DataFile object=E2=80=98s method openRa= ndomAccessFile is called. At this moment, the physical file is not yet exis= ting, the RandomAccessFile-constructor is supposed to create the file. It d= id so, when db-1.log was created: The path to the files, naturally, wasn=E2= =80=99t changed after that, and the same rights in the file system still ar= e granted.=20 The only difference we see is, that now another file is already present and= is about to be deleted. But should this have such an effect? We also suspected the whitespaces in the absolute path to the file to be a = problem =E2=80=93 but why not in the first case? Even more irritating is that obviously no one else faced such kind of FileN= otFound problem.=20 Finally, the problem seems to occur not on Windows 7, as initially reported= , but on Windows XP. It also didn=E2=80=99t occur on a Solaris machine.=20 Please, has anyone an idea what could be going on here? An comment is welc= ome!!!! =20 > KahaDB / Journal rotateWriteFIle() sometimes too fast? > ------------------------------------------------------ > > Key: AMQ-4171 > URL: https://issues.apache.org/jira/browse/AMQ-4171 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.6.0 > Environment: Tested on Windows (7) > Reporter: Matthias Vetter > > On very fast test machines we got predictable FileNotFoundExceptions when= a KahaDB data file was to be replaced by its successor. > See this from the log file: > 30.10.2012 21:00:56.359 [ActiveMQ Transport: tcp:///192.168.111.128:3134]= [ ] master-My-Company-S1-kellner20-vm Message ID:kellner20-vm-3125-13516= 18296390-23:1:1:2086:1 sent to queue://My-Company.dxm.request.provisionToTS > 30.10.2012 21:00:56.359 [BrokerService[master-My-Company-S1-kellner20-vm]= Task-197] [ ] My-Company.dxm.request.provisionToTS toPageIn: 1, Inflight= : 0, pagedInMessages.size 0, enqueueCount: 2125, dequeueCount: 2124 > -------------------------------------------------------------------------= ------ > 30.10.2012 21:00:56.390 [ConcurrentQueueStoreAndDispatch] [ ] *** SEVERE= *** > Called from org.apache.activemq.store.kahadb.MessageDatabase.store() > KahaDB failed to store to Journal > -------------------------------------------------------------------------= ------ > java.io.FileNotFoundException: activemq-data\master-My-Company-S1-kellner= 20-vm\KahaDB\db-2.log (The system cannot find the path specified) > =09at java.io.RandomAccessFile.open(Native Method) > =09at java.io.RandomAccessFile.(Unknown Source) > =09at org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.ja= va:70) > =09at org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileApp= ender.java:309) > =09at org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.j= ava:188) > 30.10.2012 21:00:56.375 [BrokerService[master-My-Company-S1-kellner20-vm]= Task-197] [ ] org.apache.activemq.broker.region.cursors.QueueStorePrefet= ch@1afd356:My-Company.dxm.request.provisionToTS,batchResetNeeded=3Dfalse,st= oreHasMessages=3Dtrue,size=3D0,cacheEnabled=3Dtrue - fillBatch > 30.10.2012 21:00:56.421 [ConcurrentQueueStoreAndDispatch] [ ] NOTIFICATI= ON: HighWater of SEVERE exceeded (value=3D1, highwater=3D1)! > 30.10.2012 21:00:56.437 [ConcurrentQueueStoreAndDispatch] [ ] Stopping = the broker due to IO exception, java.io.FileNotFoundException: activemq-dat= a\master-My-Company-S1-kellner20-vm\KahaDB\db-2.log (The system cannot find= the path specified) > java.io.FileNotFoundException: activemq-data\master-My-Company-S1-kellner= 20-vm\KahaDB\db-2.log (The system cannot find the path specified) > =09at java.io.RandomAccessFile.open(Native Method) > =09at java.io.RandomAccessFile.(Unknown Source) > =09at org.apache.kahadb.journal.DataFile.openRandomAccessFile(DataFile.ja= va:70) > =09at org.apache.kahadb.journal.DataFileAppender.processQueue(DataFileApp= ender.java:309) > =09at org.apache.kahadb.journal.DataFileAppender$1.run(DataFileAppender.j= ava:188) > ******************** > (end of log file quotation) > ******************** > Having evaluated all possibilities of the failure, we finally patched the= following method of the class Journal:: > synchronized DataFile rotateWriteFile() { > int nextNum =3D !dataFiles.isEmpty() ? dataFiles.getTail().getDat= aFileId().intValue() + 1 : 1; > File file =3D getFile(nextNum); > DataFile nextWriteFile =3D new DataFile(file, nextNum, preferedFi= leLength); > PATCH START try { > =09=09=09Thread.sleep(10000); > =09=09} catch (InterruptedException e) { > =09=09=09e.printStackTrace(); > =09=09} > PATCH END > // actually allocate the disk space > fileMap.put(nextWriteFile.getDataFileId(), nextWriteFile); > fileByFileMap.put(file, nextWriteFile); > dataFiles.addLast(nextWriteFile); > return nextWriteFile; > } > ************* > This helped, the IO Exception never occurred again after this patch. > It seems that the "nextWriteFile"-object was returned, while the allocat= ing of the new Data File wasn't finished yet.=20 > Anyway, it is very strange that we should have detected this problem, and= it never occurred earlier - at least no similar bug (???) seems to have be= en reported. Therefore, a critical examination of possibly deeper causes co= uld be useful. > Best regards,=20 > Matthias Vetter > ATOS, Frankfurt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira