hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
Date Fri, 04 Aug 2017 22:47:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115055#comment-16115055
] 

Wangda Tan edited comment on YARN-6852 at 8/4/17 10:46 PM:
-----------------------------------------------------------

Hi Miklos, really appreciate your thorough reviews, very helpful!

I address most of your comments.

Few items which I haven't addressed in the updated patch.
bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all the time or
just cache cgroups_root.
I still prefer to have it since this can help us get more configs without changing major code
structure.

bq. int input_argv_idx = 0; the first argument is the process name.
Actually the argc and argv are modified in main.c before passed to modules, I removed process
name already:
{code}
+    return handle_gpu_request(&update_cgroups_parameters, "gpu", argc - 1,
+           &argv[1]);
{code}
Please let me know if you have any suggestions to the approach. 

bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? 
Updated to argc.

bq. required and has_values could be implemented as a bit array instead of a byte array. Another
option ...
Since container-executor is not a memory-intensive application, I would prefer to spend time
on changing it when it is necessary or there's any safety concerns. :) 

bq. This pattern is C+0x.
I think Varun mentioned this in YARN-6033, it is C99: https://stackoverflow.com/a/330867

bq. arr[idx] = n; There is no overflow check. This could also be exploitable.
This might not be an issue since we have already checked the input string once:
{code}
  for (int i = 0; i < strlen(input); i++) {
    if (input[i] == ',') {
      n_numbers++;
    }
  }
{code}

bq. container_1 is an invalid container id in the unit tests. They will fail. 
Did you mean we should not fail the check? "container_1" is actually an invalid id in YARN.


bq. There is no indentation after namespace ContainerExecutor
I would prefer to not add extra indention for namespace. There're some discussions on SO:
https://stackoverflow.com/questions/713698/c-namespaces-advice

bq. static std::vector<const char*> cgroups_parameters_invoked; I think you should consider
std::string here. No need to malloc later
bq. You do not clean up files in the unit tests, do you? Is there a reason?
(TODO) Will include unit test related changes and clean ups in the next patch.

Updated ver.003 patch. [~miklos.szegedi@cloudera.com], mind to check again?


was (Author: leftnoteasy):
Hi Miklos, really appreciate your thorough reviews, very helpful!

I address most of your comments.

Few items which I haven't addressed in the updated patch.
bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all the time or
just cache cgroups_root.
I still prefer to have it since this can help us get more configs without changing major code
structure.

bq. int input_argv_idx = 0; the first argument is the process name.
Actually the argc and argv are modified in main.c before passed to modules, I removed process
name already:
{code}
+    return handle_gpu_request(&update_cgroups_parameters, "gpu", argc - 1,
+           &argv[1]);
{code}
Please let me know if you have any suggestions to the approach. 

bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? 
Updated to argc.

bq. required and has_values could be implemented as a bit array instead of a byte array. Another
option ...
Since container-executor is not a memory-intensive application, I would prefer to spend time
on changing it when it is necessary or there's any safety concerns. :) 

bq. This pattern is C+0x.
I think Varun mentioned this in YARN-6033, it is C99: https://stackoverflow.com/a/330867

bq. arr[idx] = n; There is no overflow check. This could also be exploitable.
This might not be an issue since we have already checked the input string once:
{code}
  for (int i = 0; i < strlen(input); i++) {
    if (input[i] == ',') {
      n_numbers++;
    }
  }
{code}

bq. container_1 is an invalid container id in the unit tests. They will fail. 
Did you mean we should not fail the check? "container_1" is actually an invalid id in YARN.


bq. There is no indentation after namespace ContainerExecutor
I would prefer to not add extra indention for namespace. There're some discussions on SO:
https://stackoverflow.com/questions/713698/c-namespaces-advice

bq. static std::vector<const char*> cgroups_parameters_invoked; I think you should consider
std::string here. No need to malloc later
bq. You do not clean up files in the unit tests, do you? Is there a reason?
(TODO) Will include unit test related changes and clean ups in the next patch.

Updated ver.003 patch.

> [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6852
>                 URL: https://issues.apache.org/jira/browse/YARN-6852
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6852.001.patch, YARN-6852.002.patch, YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message