Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 71477 invoked from network); 26 Apr 2006 16:57:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Apr 2006 16:57:05 -0000 Received: (qmail 2759 invoked by uid 500); 26 Apr 2006 16:57:05 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 2638 invoked by uid 500); 26 Apr 2006 16:57:05 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 2629 invoked by uid 99); 26 Apr 2006 16:57:04 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Apr 2006 09:57:04 -0700 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=RCVD_IN_BL_SPAMCOP_NET X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [207.115.57.32] (HELO ylpvm01.prodigy.net) (207.115.57.32) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Apr 2006 09:57:04 -0700 Received: from pimout7-ext.prodigy.net (pimout7-int.prodigy.net [207.115.4.147]) by ylpvm01.prodigy.net (8.12.10 outbound/8.12.10) with ESMTP id k3QGubFm024578 for ; Wed, 26 Apr 2006 12:56:39 -0400 X-ORBL: [69.228.218.244] Received: from [192.168.168.15] (adsl-69-228-218-244.dsl.pltn13.pacbell.net [69.228.218.244]) by pimout7-ext.prodigy.net (8.13.6 out.dk/8.13.6) with ESMTP id k3QGuSdf036982; Wed, 26 Apr 2006 12:56:33 -0400 Message-ID: <444FA63C.7020709@apache.org> Date: Wed, 26 Apr 2006 09:56:28 -0700 From: Doug Cutting User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051013) X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: C API for Hadoop DFS References: <01cf01c66909$72c6c850$2201a8c0@ddaslaptop> In-Reply-To: <01cf01c66909$72c6c850$2201a8c0@ddaslaptop> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Devaraj Das wrote: > Attached is a draft of the C API specification that some of us (in Yahoo) > have been thinking about. The specification is closely tied to the API > exported by Hadoop's FileSystem class. > Will really appreciate any comments, etc. on the specification. Overall, this looks great! Thanks for working on this! > /** > * dfsFileLocationInfo > * used to get the mapping between file blocks and the hostnames where > * they are stored. Due to replication, a file block could be stored on > * multiple hosts. > */ > typedef struct { > char **hostname; > int numHosts; > } dfsFileLocationInfo; > > /** > * dfsStat > * used for getting information about a file/directory > */ > typedef struct { > tObjectKind mKind; /** file or directory */ > char *mName; /* the name of the file */ > tTime mCreationTime; > dfsFileLocationInfo *fileLocationInfo; /*the last element > in the array is NULL*/ > long mSize; /*the size of the file in bytes */ > bool replicated; /*whether this file is replicated */ > } dfsFileInfo; > > /** return information about a path as a (dynamically allocated) array > * of dfsFileInfo. > * numEntries is set to the number of elements in the array. > * If the path happens to be a file, the array will have just one element. > * If the path happens to be a directory, the dfsFileInfo elements in the > * array will contain information about the files/sub-dirs within the path. > * NULL is returned if the path does not exist or some other error is > * encountered. freeDfsFileInfo should be called passing the array and > * numEntries when it is no longer needed. > */ > dfsFileInfo *dfsGetPathInfo(dfsFS fs, char *path, int *numEntries); I'm a little confused about the dfsFileLocationInfo. It exposes too much of the filesystem internals, that applications don't require. It's also expensive to return full block lists with directory listings. Instead, I think we need the following two functions: tOffset getBlockSize(dfsFs fs); char** geHosts(dfsFs fs, char* file, tOffset pos); This would return an array of hosts that contain the specified position in a file. Does that make sense? > int dfsCopyFromLocalFile(dfsFs fs, char *src, char *dst); > int dfsCopyToLocalFile(dfsFs fs, char *src, char *dst); > int dfsMoveFromLocalFile(dfsFs fs, char *src, char *dst); These are utility methods, that could be implemented by user code, i.e., not core methods. That's fine. But perhaps we should add another: int dfsCopy(dfsFs fs, char* src, char* dst); Otherwise lots of applications will end up writing this themselves. Thanks again, Doug