hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan Pollan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
Date Thu, 02 Feb 2012 15:13:53 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198861#comment-13198861
] 

Evan Pollan commented on MAPREDUCE-3583:
----------------------------------------

Zhihong -- First off, I'm assuming that this particular stime value is not legitimate.  However,
if you search around, there are discussions of how to handle wrapped jiffies values in system-level
code (e.g. [here|http://www.linuxquestions.org/questions/linux-kernel-70/confusion-in-jiffies-wraparound-921739/]).
 

Regardless, I believe these time values are unsigned 64 bit ints in the linux kernel code,
no?  Why don't you just parse and handle them internally as BigIntegers?

Just be aware that the values apparently can wrap even within an unsigned 64 bit number --
so pay attention to the math.
                
> ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.205.0
>         Environment: 64-bit Linux:
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
>            Reporter: Zhihong Yu
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: mapreduce-3583.txt
>
>
> HBase PreCommit builds frequently gave us NumberFormatException.
> From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
> {code}
> 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> java.lang.NumberFormatException: For input string: "18446743988060683582"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> 	at java.lang.Long.parseLong(Long.java:422)
> 	at java.lang.Long.parseLong(Long.java:468)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
> 	at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
> 	at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
> 	at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE:
> {code}
>         // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
>          pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
> {code}
> You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
> {code}
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC
2011 x86_64 GNU/Linux
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 20
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 16382
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 60000
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 2048
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 60000
> Running in Jenkins mode
> {code}
> From Nicolas Sze:
> {noformat}
> It looks like that the ppid is a 64-bit positive integer but Java long is signed and
so only works with 63-bit positive integers.  In your case,
>   2^64 > 18446743988060683582 > 2^63.
> Therefore, there is a NFE. 
> {noformat}
> I propose changing allProcessInfo to Map<String, ProcessInfo> so that we don't
encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message