Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Mon, 18 May 2015 12:24:01 +0000 (UTC)
From: "Hadoop QA (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <JIRA.12726479.1404950915000.143178.1431951841830@Atlassian.JIRA>
In-Reply-To: <JIRA.12726479.1404950915000@Atlassian.JIRA>
References: <JIRA.12726479.1404950915000@Atlassian.JIRA>
 <JIRA.12726479.1404950915119@arcas>
Subject: [jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error
 if list of input files is high. Error is: "error=7, Argument list too long
 at if number of input file is high"
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
14547929#comment-14547929 ]=20

Hadoop QA commented on MAPREDUCE-5965:
--------------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 10s | Pre-patch trunk compilation=
 is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain =
any @author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appe=
ar to include any new or modified tests.  Please justify why no new tests a=
re needed for this patch. Also please list what manual steps were performed=
 to verify this patch. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warni=
ng messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc w=
arning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch doe=
s not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checks=
tyle issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines t=
hat end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built wit=
h eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 42s | The patch does not introdu=
ce any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   6m 14s | Tests passed in =
hadoop-streaming. |
| | |  42m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12733519/MAPR=
EDUCE-5965.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 363c355 |
| hadoop-streaming test log | https://builds.apache.org/job/PreCommit-MAPRE=
DUCE-Build/5741/artifact/patchprocess/testrun_hadoop-streaming.txt |
| Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/57=
41/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SM=
P PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/=
5741/console |


This message was automatically generated.

> Hadoop streaming throws error if list of input files is high. Error is: "=
error=3D7, Argument list too long at if number of input file is high"
> -------------------------------------------------------------------------=
-------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5965
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Arup Malakar
>            Assignee: Arup Malakar
>         Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPR=
EDUCE-5965.patch
>
>
> Hadoop streaming exposes all the key values in job conf as environment va=
riables when it forks a process for streaming code to run. Unfortunately th=
e variable mapreduce_input_fileinputformat_inputdir contains the list of in=
put files, and Linux has a limit on size of environment variables + argumen=
ts.
> Based on how long the list of files and their full path is this could be =
pretty huge. And given all of these variables are not even used it stops us=
er from running hadoop job with large number of files, even though it could=
 be run.
> Linux throws E2BIG if the size is greater than certain size which is erro=
r code 7. And java translates that to "error=3D7, Argument list too long". =
More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping=
 variables if it is greater than certain length. That way if user code requ=
ires the environment variable it would fail. It should also introduce a con=
fig variable to skip long variables, and set it to false by default. That w=
ay user has to specifically set it to true to invoke this feature.
> Here is the exception:
> {code}
> Error: java.lang.RuntimeException: Error in configuring object at org.apa=
che.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org=
.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org=
.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) a=
t org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.ap=
ache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapre=
d.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doP=
rivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:4=
15) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform=
ation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:=
163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.=
NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethod=
AccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.Delega=
tingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java=
.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.Refl=
ectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java=
.lang.RuntimeException: Error in configuring object at org.apache.hadoop.ut=
il.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoo=
p.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoo=
p.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.=
hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by:=
 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAcc=
essorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.in=
voke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAcces=
sorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.M=
ethod.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.set=
JobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeE=
xception: configuration exception at org.apache.hadoop.streaming.PipeMapRed=
.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.c=
onfigure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Ca=
nnot run program "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercach=
e/oo-analytics/appcache/application_1403599726264_13177/container_140359972=
6264_13177_01_000006/./rbenv_runner.sh": error=3D7, Argument list too long =
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.h=
adoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Cause=
d by: java.io.IOException: error=3D7, Argument list too long at java.lang.U=
NIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXP=
rocess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at ja=
va.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more Containe=
r killed by the ApplicationMaster. Container killed on request. Exit code i=
s 143 Container exited with a non-zero exit code 143
> {code}
> Hive does a similar trick: HIVE-2372 I have a patch for this, will soon s=
ubmit a patch.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)