Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D71C17D45 for ; Thu, 5 Feb 2015 19:43:35 +0000 (UTC) Received: (qmail 54798 invoked by uid 500); 5 Feb 2015 19:43:35 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 54721 invoked by uid 500); 5 Feb 2015 19:43:34 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 54709 invoked by uid 99); 5 Feb 2015 19:43:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 19:43:34 +0000 Date: Thu, 5 Feb 2015 19:43:34 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: "error=7, Argument list too long at if number of input file is high" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5965: ---------------------------------------- Status: Open (was: Patch Available) Cancelling patch since it no longer applies. > Hadoop streaming throws error if list of input files is high. Error is: "= error=3D7, Argument list too long at if number of input file is high" > -------------------------------------------------------------------------= ------------------------------------------------------------------- > > Key: MAPREDUCE-5965 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Arup Malakar > Assignee: Arup Malakar > Attachments: MAPREDUCE-5965.patch > > > Hadoop streaming exposes all the key values in job conf as environment va= riables when it forks a process for streaming code to run. Unfortunately th= e variable mapreduce_input_fileinputformat_inputdir contains the list of in= put files, and Linux has a limit on size of environment variables + argumen= ts. > Based on how long the list of files and their full path is this could be = pretty huge. And given all of these variables are not even used it stops us= er from running hadoop job with large number of files, even though it could= be run. > Linux throws E2BIG if the size is greater than certain size which is erro= r code 7. And java translates that to "error=3D7, Argument list too long". = More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping= variables if it is greater than certain length. That way if user code requ= ires the environment variable it would fail. It should also introduce a con= fig variable to skip long variables, and set it to false by default. That w= ay user has to specifically set it to true to invoke this feature. > Here is the exception: > {code} > Error: java.lang.RuntimeException: Error in configuring object at org.apa= che.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org= .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org= .apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) a= t org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.ap= ache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapre= d.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doP= rivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:4= 15) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:= 163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.= NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethod= AccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.Delega= tingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java= .lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.Refl= ectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java= .lang.RuntimeException: Error in configuring object at org.apache.hadoop.ut= il.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoo= p.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoo= p.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.= hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by:= java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAcc= essorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.in= voke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAcces= sorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.M= ethod.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.set= JobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeE= xception: configuration exception at org.apache.hadoop.streaming.PipeMapRed= .configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.c= onfigure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Ca= nnot run program "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercach= e/oo-analytics/appcache/application_1403599726264_13177/container_140359972= 6264_13177_01_000006/./rbenv_runner.sh": error=3D7, Argument list too long = at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.h= adoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Cause= d by: java.io.IOException: error=3D7, Argument list too long at java.lang.U= NIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXP= rocess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at ja= va.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more Containe= r killed by the ApplicationMaster. Container killed on request. Exit code i= s 143 Container exited with a non-zero exit code 143 > {code} > Hive does a similar trick: HIVE-2372 I have a patch for this, will soon s= ubmit a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)