hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
Date Mon, 07 Aug 2017 20:09:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117175#comment-16117175
] 

Miklos Szegedi commented on YARN-6852:
--------------------------------------

Thank you [~wangda],
I have a second batch.
I could not find a doc about this, so I ask. Is there a reason to have a disable list of devices,
instead of an enable list?
command-line-parser.c, etc: I think it is more common to have a_b.c style file naming. For
example in the Linux kernel: attribute_container.c
{code}
int handle_gpu_request(update_cgroups_parameters_func func,
143        const char* module_name, int argc, char** argv) {
{code}
Since they are not argv anymore I would use the names paramc, params instead of argc, arg
{code}
int* minor_devices;
{code}
I would call internal_handle_gpu_request, only if minor_devices is initialized. Currently
it may contain any garbage on the stack, if ā€˜eā€™ never hits.
{code}
191          default:
192            fprintf(ERRORFILE,
193              "Unknown option in gpu command character %d %c, optionindex = %d\n",
194              c, c, optind);
195            return -1;
196            break;
{code}
No need to break;
{code}
if (0 == strlen(container_id)) {
{code}
container_id[0]==0
{code}
strncpy(container_id, optarg, MAX_CONTAINER_ID_LEN)
{code}
strncpy does not null terminate the string if the length equals to MAX_CON...
{code}
free(full_path);
{code}
It is a common practice to
{code}
ā€¦ goto cleanup;
cleanup:
free(full_path);
return X;
{code}
-
{code}
  if (!major_number_str || 0 == strlen(major_number_str)) {
{code}
This could be major_number_str[0]==0
{code}
61        // Default major number of Nvidia devices
62        major_device_number = 195;
{code}
It might be nicer to have a #define for this number
{code}
72      if (!allowed_minor_numbers_str || strlen(allowed_minor_numbers_str)) {
73        allowed_minor_numbers = NULL;
{code}
Bug: strlen==0 (!) or allowed_minor_numbers_str[0]==0
{code}
get_section_value return values are not freed in internal_handle_gpu_request
{code}
200      if (0 == strlen(container_id)) {
{code}
This could be container_id[0]==0
{code}
164      int* minor_devices;
{code}
I think this is never freed.
parse_commandline_opts may leak opts, opts->keys and opts-> values when returning -1
on errors.
{code}
41      if (!opts->keys || !opts->values) {
42        fprintf(ERRORFILE, "Failed to malloc keys or values of opts\n");
43        return NULL;
44      }
{code}
You still need to free keys if values is NULL
{code}
51      if (!(known_parameters && required && has_values)) {
{code}
Ideally, this is a check at the beginning of the function.
{code}
64        // make sure param_name start with "--"
65        if (0 != strncmp("--", param_name, 2)) {
66          fprintf(ERRORFILE, "option %s is not started with \"--\"\n", param_name);
67          return NULL;
68        }
82        if (param_idx < 0) {
83          fprintf(ERRORFILE, "cannot find parameter %s from known parameters\n", param_name);
84          return NULL;
85        }
{code}
param_name is leaked.
Is parse_commandline_opts used anywhere? If not, please remove it from this jira.
{code}
24    /*
25     * if all chars in the input str are numbers
26     * return true/false
27     */
28    static int all_numbers(char* input) {
29      int len = strlen(input);
30    
31      for (int i = 0; i < len; i++) {
32        if (input[i] < '0' || input[i] > '9') {
33          return 0;
34        }
35      }
36      return 1;
37    }
{code}
This should be like the code below, otherwise we pass through the string twice:
{code}
24    /*
25     * if all chars in the input str are numbers
26     * return true/false
27     */
28    static int all_numbers(char* input) {
31      for (; input[0] != 0; input++) {
32        if (input[0] < '0' || input[0] > '9') {
33          return 0;
34        }
35      }
36      return 1;
37    }
{code}

{code}
39    int get_numbers_split_by_comma(char* input, int** numbers, size_t* ret_n_numbers) {
40      size_t n_numbers = 1;
41      for (int i = 0; i < strlen(input); i++) {
42        if (input[i] == ',') {
43          n_numbers++;
44        }
45      }
46    
47      (*numbers) = malloc(sizeof(int) * n_numbers);
48      if (!(*numbers)) {
49        return -1;
50      }
51    
52      char* input_cpy = malloc(strlen(input));
53      strcpy(input_cpy, input);
54    
55      char* p = strtok(input_cpy, ",");
56      int idx = 0;
57      while (p != NULL) {
58        int n = atoi(p);
59        (*numbers)[idx] = n;
60        p = strtok(NULL, ",");
61        idx++;
62      }
63    
64      free(input_cpy);
65      *ret_n_numbers = n_numbers;
66    
67      return 0;
68    }
{code}
get_numbers_split_by_comma: Please do not do strlen at every character. It will be an O(n^2)
algorithm from an O(n).
get_numbers_split_by_comma: You may actually return an array with garbage at the end, since
a double comma (,,) will not trigger a new strtok string but your n_numbers code calculates
all commas.
validate_container_id: It does not free input_cpy, if it returns 0 in the middle of the function.

> [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6852
>                 URL: https://issues.apache.org/jira/browse/YARN-6852
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6852.001.patch, YARN-6852.002.patch, YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message