hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2461) Support HDFS file name globbing in libhdfs
Date Tue, 18 Oct 2011 16:51:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129852#comment-13129852
] 

Mariappan Asokan commented on HDFS-2461:
----------------------------------------

I thought more on this.  Since the Java globStatus() method already queried the name node
to retrieve the status information, for the sake of efficiency I think we can change the function
signature.  Also, conforming to already existing hdfsListDirectory(), I decided to return
an array of structures rather than array of pointers.  This will enable reusing the existing
C function hdfsFreeFileInfo().  I also added the path filter function in the interface.  Filtering
will be done in the C implementation.  Following is the description of the prototype of the
single function:
{code:title=hdfs.h}
/**
 * Path filter function prototype.
 * @param pathName path name passed to this function.
 * @return 0 if the path name has to be excluded; a non-zero otherwise.
 */
typedef int (*PathFilter)(const char * pathName);

/**
 * hdfsGlobStatus - Get status for all HDFS file names that match a glob
 * pattern.  The returned result will be an array of hdfsFileInfo structures.
 * The array is sorted by file names.
 * The function hdfsFreeFileInfo() should be called to free this array and its
 * contents.
 * @param fs The configured filesystem handle.
 * @param globPattern The glob pattern(as supported by Hadoop implementation) to
 * match file names against.
 * @param filter A path filter function.  If this is NULL, no filtering will be
 * done after glob expansion.
 * @param numEntries pointer to an integer in which the number of entries in the
 * returned array will be returned.  This will be set to -1 in case of error.
 * @return Returns a dynamically-allocated array of hdfsFileInfo structures; if
 * there is no match or an error, a NULL value will be returned.  An error
 * condition can be identified by testing numEntries.
 */
hdfsFileInfo * hdfsGlobStatus(hdfsFS fs, const char *globPattern,
                              PathFilter filter, int *numEntries);
{code}

If anyone has any comments, please let me know.
Thanks.

                
> Support HDFS file name globbing in libhdfs
> ------------------------------------------
>
>                 Key: HDFS-2461
>                 URL: https://issues.apache.org/jira/browse/HDFS-2461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: libhdfs
>            Reporter: Mariappan Asokan
>            Priority: Minor
>
> This is to enhance the C API in libhdfs to support HDFS file name globbing.  The proposal
is to keep the new API simple and return a list of matched HDFS path names.  Callers can use
existing hdfsGetPathInfo() to get additional information on each of the matched path.  Following
code snippet shows the proposed API enhancements:
> {code:title=hdfs.h}
> /**
>  * hdfsGlob - Get all the HDFS file names that match a glob pattern.  The
>  * returned result will be sorted by the file names.  The last element in the
>  * array is NULL.  The function hdfsFreeGlob() should be called to free this
>  * array and its contents.
>  * @param fs The configured filesystem handle.
>  * @param globPattern The glob pattern to match file names against.  Note that
>  * this is not a POSIX regular expression but rather a POSIX glob pattern.
>  * @return Returns a dynamically-allocated array of strings; if there is no
>  * match, an array with one entry that has a NULL value will be returned.  If
>  * there is an error, NULL will be returned.
>  */
> char ** hdfsGlob(hdfsFS fs, const char *globPattern);
> /**
>  * hdfsFreeGlob - Free up the array returned by hdfsGlob().
>  * @param globResult The array of dynamically-allocated strings returned by
>  * hdfsGlob().
>  */
> void hdfsFreeGlob(char **globResult);
> {code}
> Please comment on the above proposed API.  I will start the implementation and testing.
 However, I need a committer to work with.
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message