Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B41E18614 for ; Mon, 18 May 2015 12:24:02 +0000 (UTC) Received: (qmail 63403 invoked by uid 500); 18 May 2015 12:24:02 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 63341 invoked by uid 500); 18 May 2015 12:24:02 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 63326 invoked by uid 99); 18 May 2015 12:24:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 May 2015 12:24:01 +0000 Date: Mon, 18 May 2015 12:24:01 +0000 (UTC) From: "Hadoop QA (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: "error=7, Argument list too long at if number of input file is high" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14547929#comment-14547929 ]=20 Hadoop QA commented on MAPREDUCE-5965: -------------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 10s | Pre-patch trunk compilation= is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain = any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appe= ar to include any new or modified tests. Please justify why no new tests a= re needed for this patch. Also please list what manual steps were performed= to verify this patch. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warni= ng messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc w= arning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch doe= s not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checks= tyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines t= hat end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built wit= h eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 42s | The patch does not introdu= ce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 6m 14s | Tests passed in = hadoop-streaming. | | | | 42m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733519/MAPR= EDUCE-5965.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 363c355 | | hadoop-streaming test log | https://builds.apache.org/job/PreCommit-MAPRE= DUCE-Build/5741/artifact/patchprocess/testrun_hadoop-streaming.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/57= 41/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SM= P PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/= 5741/console | This message was automatically generated. > Hadoop streaming throws error if list of input files is high. Error is: "= error=3D7, Argument list too long at if number of input file is high" > -------------------------------------------------------------------------= ------------------------------------------------------------------- > > Key: MAPREDUCE-5965 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Arup Malakar > Assignee: Arup Malakar > Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPR= EDUCE-5965.patch > > > Hadoop streaming exposes all the key values in job conf as environment va= riables when it forks a process for streaming code to run. Unfortunately th= e variable mapreduce_input_fileinputformat_inputdir contains the list of in= put files, and Linux has a limit on size of environment variables + argumen= ts. > Based on how long the list of files and their full path is this could be = pretty huge. And given all of these variables are not even used it stops us= er from running hadoop job with large number of files, even though it could= be run. > Linux throws E2BIG if the size is greater than certain size which is erro= r code 7. And java translates that to "error=3D7, Argument list too long". = More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping= variables if it is greater than certain length. That way if user code requ= ires the environment variable it would fail. It should also introduce a con= fig variable to skip long variables, and set it to false by default. That w= ay user has to specifically set it to true to invoke this feature. > Here is the exception: > {code} > Error: java.lang.RuntimeException: Error in configuring object at org.apa= che.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org= .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org= .apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) a= t org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.ap= ache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapre= d.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doP= rivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:4= 15) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:= 163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.= NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethod= AccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.Delega= tingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java= .lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.Refl= ectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java= .lang.RuntimeException: Error in configuring object at org.apache.hadoop.ut= il.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoo= p.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoo= p.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.= hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by:= java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAcc= essorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.in= voke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAcces= sorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.M= ethod.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.set= JobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeE= xception: configuration exception at org.apache.hadoop.streaming.PipeMapRed= .configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.c= onfigure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Ca= nnot run program "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercach= e/oo-analytics/appcache/application_1403599726264_13177/container_140359972= 6264_13177_01_000006/./rbenv_runner.sh": error=3D7, Argument list too long = at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.h= adoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Cause= d by: java.io.IOException: error=3D7, Argument list too long at java.lang.U= NIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXP= rocess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at ja= va.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more Containe= r killed by the ApplicationMaster. Container killed on request. Exit code i= s 143 Container exited with a non-zero exit code 143 > {code} > Hive does a similar trick: HIVE-2372 I have a patch for this, will soon s= ubmit a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)