hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8029) org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible does not handle EINVAL
Date Fri, 01 Mar 2013 02:23:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590183#comment-13590183
] 

Todd Lipcon commented on HADOOP-8029:
-------------------------------------

One thought: both in trunk and in branch-1, it seems like we should eventually disable fadvise
- otherwise performance is still going to be terrible because it will spit a WARN out for
every chunk read. Maybe something like, if we fail to fadvise, then we disable it for the
next 60 seconds, so at the worst case we only log once a minute instead of potentially tens
or hundreds of times per second?
                
> org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible does not handle EINVAL
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8029
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.20.205.0
>         Environment: Debian Wheezy 64-bit 
> uname -a = "Linux desktop 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 x86_64 GNU/Linux"

> cat /etc/issue = "Debian GNU/Linux wheezy/sid \n \l" 
> /etc/apt/sources.list = " 
> deb http://ftp.us.debian.org/debian/ wheezy main contrib non-free 
> deb-src http://ftp.us.debian.org/debian/ wheezy main contrib non-free 
> deb http://security.debian.org/ wheezy/updates main contrib non-free 
> deb-src http://security.debian.org/ wheezy/updates main contrib non-free 
> deb http://archive.cloudera.com/debian squeeze-cdh3 contrib 
> deb-src http://archive.cloudera.com/debian squeeze-cdh3 contrib" 
> Hadoop specific configuration (disabled permissions, pseudo-distributed mode, replication
set to 1, from my own blog post here: http://j.mp/tsVBR4
>            Reporter: Tim Mattison
>         Attachments: HADOOP-8029.001.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When Hadoop's directories reside on tmpfs in Debian Wheezy (and possibly all Linux 3.1
distros) in an installation that is using the native libraries fadvise returns EINVAL when
trying to run a MapReduce job.  Since EINVAL isn't handled all MapReduce jobs report "Map
output lost, rescheduling: getMapOutput".
> A full stack trace for this issue looks like this:
> [exec] 12/02/03 09:50:58 INFO mapred.JobClient: Task Id : attempt_201202030949_0001_m_000000_0,
Status : FAILED
> [exec] Map output lost, rescheduling: getMapOutput(attempt_201202030949_0001_m_000000_0,0)
failed :
> [exec] EINVAL: Invalid argument
> [exec] at org.apache.hadoop.io.nativeio.NativeIO.posix_fadvise(Native Method)
> [exec] at org.apache.hadoop.io.nativeio.NativeIO.posixFadviseIfPossible(NativeIO.java:177)
> [exec] at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4026)
> [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> [exec] at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> [exec] at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> [exec] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> [exec] at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:829)
> [exec] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> [exec] at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> [exec] at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> Some logic will need to be implemented to handle EINVAL to properly support all file
systems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message