hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5983) [Umbrella] Support for FPGA as a Resource in YARN
Date Fri, 28 Apr 2017 07:02:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988319#comment-15988319

Devaraj K commented on YARN-5983:

Thanks [~tangzhankun] and [~zyluo] for the design doc and hardwork, [~leftnoteasy] for the

The scheduler only considers non-exclusive resource. The exclusive resources may
have extra attributes needs to be matched when scheduling. Not just simply add or
reduce a number. For instance, in our PoC, a FPGA slot in one node may already
have one IP flashed so that the scheduler should try to match this IP attribute to
reuse it.

If you are passing all the attributes of the FPGA resources to RM scheduler, why do you want
to have the NM side resource management? Can you give some details about the attributes passing
to the RM and details maintain by the NM side resource management in abstract terms? 

2. {code:xml}
 Device resource needs additional preparation and isolation before container launch.
For instance, FPGA device may need to download an IP file from a repo then flash to
an allocated FPGA slot.
Does this need to be done for each container, Can it be done one time during the cluster installation?

3. Can FPGA slots share my multiple containers? How do we prevent if any container(Non FPGA
allocated container)/application try to use the FPGA resources which are not allocated to

4. Any changes to ContainerExecutor, how does the application code running in the container
come to know about the allocated FPGA resource to access/use the FPFA?

5. What are the configurations user to need to configure for the application to use FPGA resources?

> [Umbrella] Support for FPGA as a Resource in YARN
> -------------------------------------------------
>                 Key: YARN-5983
>                 URL: https://issues.apache.org/jira/browse/YARN-5983
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: yarn
>            Reporter: Zhankun Tang
>            Assignee: Zhankun Tang
>         Attachments: YARN-5983-Support-FPGA-resource-on-NM-side_v1.pdf
> As various big data workload running on YARN, CPU will no longer scale eventually and
heterogeneous systems will become more important. ML/DL is a rising star in recent years,
applications focused on these areas have to utilize GPU or FPGA to boost performance. Also,
hardware vendors such as Intel also invest in such hardware. It is most likely that FPGA will
become popular in data centers like CPU in the near future.
> So YARN as a resource managing and scheduling system, would be great to evolve to support
this. This JIRA proposes FPGA to be a first-class citizen. The changes roughly includes:
> 1. FPGA resource detection and heartbeat
> 2. Scheduler changes
> 3. FPGA related preparation and isolation before launch container
> We know that YARN-3926 is trying to extend current resource model. But still we can leave
some FPGA related discussion here

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message