hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card
Date Fri, 08 Mar 2019 12:25:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787841#comment-16787841
] 

Hudson commented on YARN-9265:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16161 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16161/])
YARN-9265. FPGA plugin fails to recognize Intel Processing Accelerator (sunilg: rev de15a66d782094632abd09222b87a01bab8e0f5e)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaNodeResourceUpdateHandler.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/FPGADiscoveryStrategy.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/TestFpgaDiscoverer.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/SettingsBasedFPGADiscoveryStrategy.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/AoclOutputBasedDiscoveryStrategy.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/DeviceSpecParser.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/IntelFpgaOpenclPlugin.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/FpgaDiscoverer.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/ScriptBasedFPGADiscoveryStrategy.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/fpga/discovery/package-info.java


> FPGA plugin fails to recognize Intel Processing Accelerator Card
> ----------------------------------------------------------------
>
>                 Key: YARN-9265
>                 URL: https://issues.apache.org/jira/browse/YARN-9265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 3.1.0
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: YARN-9265-001.patch, YARN-9265-002.patch, YARN-9265-003.patch, YARN-9265-004.patch,
YARN-9265-005.patch, YARN-9265-006.patch, YARN-9265-007.patch, YARN-9265-008.patch, YARN-9265-009.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> --------------------------------------------------------------------
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   Status            Information
>  
> pac_a10_f200000     Passed            PAC Arria 10 Platform (pac_a10_f200000)
>                                       PCIe 08:00.0
>                                       FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> --------------------------------------------------------------------
>  
> Call "aocl diagnose <device-names>" to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
Using FPGA vendor plugin: org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
Failed to get major-minor number from reading /dev/pac_a10_f300000
> 2019-01-25 06:46:03,252 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Failed to bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the "Physical
Dev Name", but this is wrong. For example, it thinks that the device file is {{/dev/pac_a10_f300000}}
which is not the case, the actual file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message