Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0ED3CD32F for ; Fri, 5 Oct 2012 00:53:48 +0000 (UTC) Received: (qmail 55972 invoked by uid 500); 5 Oct 2012 00:53:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 55925 invoked by uid 500); 5 Oct 2012 00:53:47 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 55914 invoked by uid 99); 5 Oct 2012 00:53:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 00:53:47 +0000 Date: Fri, 5 Oct 2012 11:53:47 +1100 (NCT) From: "Andy Isaacson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1679886178.3471.1349398427785.JavaMail.jiratomcat@arcas> In-Reply-To: <1888238835.65467.1347397568225.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469928#comment-13469928 ] Andy Isaacson commented on HDFS-3920: ------------------------------------- Overall, looking good. A few more improvements: {code} - rbuffer->content = realloc(rbuffer->content, rbuffer->offset + size * nmemb + 1); + rbuffer->content = realloc(rbuffer->content, + rbuffer->offset + size * nmemb + 1); if (rbuffer->content == NULL) { {code} At this point we've leaked the previous {{rbuffer->content}} because realloc does not free on error. Better to do something like {code} void *tmp = realloc(buf->content, ...); if (tmp == NULL) { ... return NULL; } buf->content = tmp; {code} {code} + curl = curl_easy_init(); + if (!curl) { + errno = ENOMEM; // curl_easy_init does return error code, + // and most of its errors are caused by malloc() {code} Maybe the comment means to say "does *not* return error code"? It doesn't make very much sense to me currently. {code} + if(followloc == YES) { {code} Style nit: put a space between {{if}} and the (. (Several cases of this problem.) {code} + offset = snprintf(url, length + 1, "%s%s:%d%s%s?op=%s", + protocol, host, nnPort, prefix, path, op); + for (i = 0; i < paraNum; i++) { + if (!paraNames[i] || !paraValues[i]) { + continue; + } + offset += snprintf(url + offset, length - offset + 1, + "&%s=%s", paraNames[i], paraValues[i]); {code} After calling snprintf, you must verify that there is still room left in the buffer and error out if not. If {{length - offset + 1}} becomes negative, you might underflow and write past the end of the buffer. {code} +#define NUM_OF_PERMISSION_BITS 3 +#define NUM_OF_SHORT_VALUE_BITS 5 +#define NUM_OF_LONG_VALUE_BITS 15 {code} Based on reading the code, I think these are intended to be the number of characters to allocate to hold a "%d" formatting of a relevant type. The name should say so -- NUM_PERM_CHAR, NUM_SHORT_CHAR, NUM_LONG_CHAR perhaps? And since they are *always* used with +1 for the NUL, you may as well just include that here, in which case calling it PERM_STR_LEN, SHORT_STR_LEN, LONG_STR_LEN might be better. Also, add a comment explaining where the values come from. If my understanding is correct, then the right value for LONG is 21, not 16, since 2^64 is 20 decimal digits. Combining all of that, this becomes {code} #define PERM_STR_LEN 4 // "644" + one byte for NUL #define SHORT_STR_LEN 6 // 65535 + NUL #define LONG_STR_LEN 21 // 2^64-1 = 18446744073709551615 + NUL {code} {code} + if(!header || header[0] == '\0' || + strncmp(header, "HTTP/", strlen("HTTP/"))) { {code} Simplify this to {{if (!header || strncmp(header, ...}} since strncmp will handle a zero-length-string just fine. {code} - if(strcmp(result,"0")) { //Content-Length should be equal to 0 + if(strncmp(result, "0", 1)) { {code} This will do the wrong thing if result is "01". I think the strcmp version is more correct. {code} + const char *responseCode = "307 TEMPORARY_REDIRECT"; + if(!header || header[0] == '\0' || strncmp(header,"HTTP/", 5)) { ... + } + if(!(tempHeader = strstr(header,responseCode))) { {code} # no need to special case the zero-length-string case # strstr is not right here, it would succeed if given {{"HTTP/1.0 foobar baz 307 TEMPORARY_REDIRECT"}} which should fail. Better to {{strspn(" \t"}} then strncmp. # space after comma. {code} + if(header == '\0' || strncmp(header,"HTTP/", strlen("HTTP/"))) { {code} should be {{if ((header == NULL || strncmp...}}. Do not confuse the pointer {{NULL}} with the char NUL {{'\0'}}. {code} + if(!(strstr(header,responseCode1) && strstr(header, responseCode2))) { {code} More inappropriate use of strstr. {code} + if(arraylen > 0) { + fileStat = realloc(fileStat, sizeof(hdfsFileInfo) * arraylen); + } {code} # need to check for realloc failing # similar to above realloc comment, shouldn't reassign to fileStat without a temporary. {code} +static hdfsFileInfo *json_parse_array(json_t *jobj, hdfsFileInfo *fileStat, + int *numEntries, const char *operation, + int printError) {code} The name {{json_parse_array}} is too generic for something that takes a {{hdfsFileInfo}} argument. Perhaps {{parse_json_fileinfo}} or similar. The function has a very tricky definition -- it reallocates {{fileStat}} and returns the new buffer. That deserves a doc comment at least, or a redesign where the argument is {{hdfsFileInfo **fileStat}}. {code} + void *iter = json_object_iter(jobj); + while(iter) { + key = json_object_iter_key(iter); + value = json_object_iter_value(iter); + switch (json_typeof(value)) { + case JSON_INTEGER: + if(!strcmp(key, "accessTime")) { + fileStat->mLastAccess = json_integer_value(value) / 1000; {code} It's very confusing to have the {{strcmp(key}} inside the {{switch(type)}} code block. This makes the decision flow very hard to follow. Unless the units are obvious, magic numbers like {{/ 1000}} need a comment explaining why the value is being converted (from milliseconds to seconds? or microseconds?). Something like {{// json field contains time in milliseconds, hdfsFileInfo is counted in microseconds}} or similar. {code} + default: + free(fileStat); + fileStat = NULL; + } {code} If we hit this default, the next loop through will segfault. Better to exit the loop explicitly with a return or a test variable or a goto. If this is handling some quirk of {{json_object_iter_value}} then a comment is needed. {code} + "file creation/appending, <%d>:%s.\n", {code} Some of these are missing a space after the {{:}}. I don't really care what format you use, but please be consistent. {code} + newWorkingDir = normalizePath(newWorkingDir); {code} This leaks the old {{newWorkingDir}}. But see below for an alternate design. {code} +/** Replace "//" with "/" in path */ +static char *normalizePath(char *path) +{ + const char *delimiter = "/"; ... + token = strtok_r(path, delimiter, &savePtr); {code} This function would be much simpler if it did not generate the leading {{/}} (which it isn't documented to do) and modified the path in-place using an explicit bytewise loop rather than using {{strtok}}. Something like {code} for (i=j=sawslash=0; i libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors > ----------------------------------------------------------------------------------------------- > > Key: HDFS-3920 > URL: https://issues.apache.org/jira/browse/HDFS-3920 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, HDFS-3920-005.patch > > > 1. Clean up code for string processing; > 2. Using strerror consistently for error handling; > 3. Use sprintf to replace decToOctal > 4. other issues requiring fixing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira