hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1194) Secondary namenode fails to fetch the image from the primary
Date Tue, 22 Jun 2010 22:24:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881412#action_12881412

Dmytro Molkov commented on HDFS-1194:

We switched to using jetty 6.1.24 and can now checkpoint using secondary again.

The log on both nodes shows that we are hitting the JVM bug over and over again (24 jetty
has instrumentation to better understand what is happening to the transfer).

So I say we should update the jetty version from the currently used 6.1.14 to 6.1.24

> Secondary namenode fails to fetch the image from the primary
> ------------------------------------------------------------
>                 Key: HDFS-1194
>                 URL: https://issues.apache.org/jira/browse/HDFS-1194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>         Environment: Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
> CentOS 5
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
> We just hit the problem described in HDFS-1024 again.
> After more investigation of the underlying problems with CancelledKeyException there
are some findings:
> One of the symptoms: the transfer becomes really slow (it does 700 kb/s) when I am doing
the fetch using wget. At the same time disk and network are OK since I can copy at 50 mb/s
using scp.
> I was taking jstacks of the namenode while the transfer is in process and we found that
every stack trace has one thread of jetty sitting in this place:
> {code}
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:452)
> 	at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
> 	at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
> 	at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
> 	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> {code}
> Here is a jetty code that corresponds to this:
> {code}
> // Look for JVM bug 
>                     if (selected==0 && wait>0 && (now-before)<wait/2
&& _selector.selectedKeys().size()==0)
>                     {
>                         if (_jvmBug++>5)  // TODO tune or configure this
>                         {
>                             // Probably JVM BUG!
>                             Iterator iter = _selector.keys().iterator();
>                             while(iter.hasNext())
>                             {
>                                 key = (SelectionKey) iter.next();
>                                 if (key.isValid()&&key.interestOps()==0)
>                                 {
>                                     key.cancel();
>                                 }
>                             }
>                             try
>                             {
>                                 Thread.sleep(20);  // tune or configure this
>                             }
>                             catch (InterruptedException e)
>                             {
>                                 Log.ignore(e);
>                             }
>                         } 
>                     }
> {code}
> Based on this it is obvious we are hitting a jetty workaround for a JVM bug that doesn't
handle select() properly.
> There is a jetty JIRA for this http://jira.codehaus.org/browse/JETTY-937 (it actually
introduces the workaround for the JVM bug that we are hitting)
> They say that the problem was fixed in 6.1.22, there is a person on that JIRA also saying
that switching to using SocketConnector instead of SelectChannelConnector helped in their
> Since we are hitting the same bug in our world we should either adopt the newer Jetty
version where there is a better workaround, but it might not help if we are still hitting
that bug constantly, the workaround might be better though.
> Another approach is to switch to using SocketConnector which will eliminate the problem
completely, although I am not sure what problems that will bring.
> The java version we are running is in Environment
> Any thoughts

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message