activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominic Tootell (JIRA)" <j...@apache.org>
Subject [jira] Created: (AMQ-2478) Too many files open error, after no space left on device occurs; if producer carries on sending messages.
Date Thu, 05 Nov 2009 13:51:52 GMT
Too many files open error, after no space left on device occurs; if producer carries on sending
messages.
---------------------------------------------------------------------------------------------------------

                 Key: AMQ-2478
                 URL: https://issues.apache.org/activemq/browse/AMQ-2478
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker
    Affects Versions: 5.3.0
         Environment: MacOSX 10.6.1, fusesource broker 5.3.0.4.
            Reporter: Dominic Tootell


The problem seem to be that open the persistence store (disk) has run out of space, if the
producer keeps on sending messages to the broker the brokers end up eating up the file descriptors
for the process (default 1024), and you get the error "too many open files".  The only way
to fix this is a broker restart.

1) Producer is sending to the broker
2) Disk Space on the broker runs out
3) The producer gets the error:

[2009.11.02 23:05:30] [main] INFO  ProducerTool -  Sent Message:
[18973 : ^@^@OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO...], took: 1ms
[2009.11.02 23:05:30] [main] WARN  ProducerTool -  Error sending
message:18974 : ^@^@OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO...
javax.jms.JMSException: No space left on device
      at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49)
      at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1255)

4)  The broker gets the error:

{code}
DEBUG Service                        - Error occured while processing
async command: MessageAck {commandId = 53297, responseRequired =
false, ackType = 2, consumerId =
ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:-1:2,
firstMessageId =
ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:17751,
lastMessageId =
ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:17751,
destination = queue://iplayer, transactionId =
TX:ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:17751,
messageCount = 1}, exception: java.io.IOException: No space left on
device
java.io.IOException: No space left on device
       at java.io.RandomAccessFile.setLength(Native Method)
{code}

5) All is good if you spot this and go clear up some space quick
sharp; both the broker and the producer recover and can carry one.
However, if you don't notice and react quick enough, and the producer
keeps on sending messages to the broker, then broker ends up with the
error "too many open files":

{code}
Id = ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:35920,
lastMessageId =
ID:dominic-tootells-macbook-pro.local-57143-1257203033952-0:0:1:1:35920,
destination = queue://iplayer, transactionId =
TX:ID:dominic-tootells-macbook-pro.local-57138-1257203010059-0:0:52674,
messageCount = 1}, exception: java.io.FileNotFoundException:
/Volumes/SSD/data/journal/data-4 (Too many open files)
java.io.FileNotFoundException: /Volumes/SSD/data/journal/data-4 (Too
many open files)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
       at org.apache.activemq.kaha.impl.async.DataFile.openRandomAccess
{code}


Trying the following combinations:

- no failover protocol
- no send sendFailIfNoSpace being sent to the producer
- recreating the producer connection after the error
- no consumer attached to the broker

In the end I attached JProfiler to the broker (via a small junit), and
noticed that upon the "No space left on device" error the number of
File objects and FileDescriptor objects would grow, and not shrink.
Upon looking at the below stack trace:

{code}
Caused by: java.io.IOException: No space left on device
       at java.io.RandomAccessFile.setLength(Native Method)
       at org.apache.activemq.kaha.impl.async.DataFile.openRandomAccessFile(DataFile.java:96)
       at org.apache.activemq.kaha.impl.async.AsyncDataManager.allocateLocation(AsyncDataManager.java:276)
       at org.apache.activemq.kaha.impl.async.DataFileAppender.storeItem(DataFileAppender.java:169)
       at org.apache.activemq.kaha.impl.async.AsyncDataManager.write(AsyncDataManager.java:647)
       at org.apache.activemq.store.amq.AMQPersistenceAdapter.writeCommand(AMQPersistenceAdapter.java:697)
       at org.apache.activemq.store.amq.AMQPersistenceAdapter.writeCommand(AMQPersistenceAdapter.java:693)
       at org.apache.activemq.store.amq.AMQMessageStore.addMessage(AMQMessageStore.java:106)
       at org.apache.activemq.broker.region.Queue.doMessageSend(Queue.java:503)
       at org.apache.activemq.broker.region.Queue.send(Queue.java:480)
       at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:354)
       at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:443)
       at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:224)
       at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:95)
       at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:133)
       at org.apache.activemq.broker.TransportConnection.processMessage(TransportConnection.java:455)
       at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:639)
       at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:308)
       at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:182)
       at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:68)
       at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
       at org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:210)
       at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:84)
       at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
       at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:185)
       at java.lang.Thread.run(Thread.java:637)
{code}

I took a look at:

org.apache.activemq.kaha.impl.async.DataFile.openRandomAccessFile(DataFile.java:96):

{code}
  public synchronized RandomAccessFile openRandomAccessFile(boolean
appender) throws IOException {
       RandomAccessFile rc = new RandomAccessFile(file, "rw");
       // When we start to write files size them up so that the OS has a chance
       // to allocate the file contigously.
       if (appender) {
           if (length < preferedSize) {
                       rc.setLength(preferedSize);
           }
       }
       return rc;
   }
{code}

  The problem is the rc.setLength(preferedSize);  without a try/catch
block to close the opened file incase of a IOException, that can
result from the setLength on empty filesystem.

  Changing the method to, contain a try/catch as follows, from my testing appears to fix the
issue (have tried on my local broker, and this works).

{code}
  public synchronized RandomAccessFile openRandomAccessFile(boolean
appender) throws IOException {
       RandomAccessFile rc = new RandomAccessFile(file, "rw");
       // When we start to write files size them up so that the OS has a chance
       // to allocate the file contigously.
       if (appender) {
           if (length < preferedSize) {
               try
               {
                       rc.setLength(preferedSize);
               }
               catch(IOException e)
               {
                       try
                       {
                               rc.close();
                       }
                       catch(Exception closeException){}
                       throw e;
               }

           }
       }
       return rc;
   }
{code}


I shall attach a junit for testing (it is hard coded to write to my small removal disk /Volumes/SSD/data),
so this you will need to change.  I need somewhere where I could fill the disk up.  The Junit
just does:

- Producer writes to a persistent queue until the disk space fills up and keeps on going.
 After a while you see the "too many open files" exception.

I've looked at trunk

https://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/kaha/impl/async/DataFile.java

And this has the same code as the 5.3.0.4 so I'm guessing that would have the same issue.

I'll attach the junit, the patch diff and the patch file.
/dom



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message