Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 46657 invoked from network); 10 Apr 2006 21:00:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Apr 2006 21:00:52 -0000 Received: (qmail 70251 invoked by uid 500); 10 Apr 2006 21:00:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 70230 invoked by uid 500); 10 Apr 2006 21:00:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 70221 invoked by uid 99); 10 Apr 2006 21:00:52 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Apr 2006 14:00:52 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Apr 2006 14:00:51 -0700 Received: from ajax (localhost.localdomain [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 9C3C26ACA9 for ; Mon, 10 Apr 2006 22:00:30 +0100 (BST) Message-ID: <670450193.1144702830599.JavaMail.jira@ajax> Date: Mon, 10 Apr 2006 22:00:30 +0100 (BST) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-129) FileSystem should not name files with java.io.File In-Reply-To: <2087016237.1144695959903.JavaMail.jira@ajax> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-129?page=comments#action_12373928 ] Doug Cutting commented on HADOOP-129: ------------------------------------- > I think we should change this to a Hadoop-specific class, e.g. FileName. Why not URI? What required methods are missing from URI? Conversely, what URI methods do you think might cause problems? Partially answering my own question, with URIs we'd have to check the schema host and port matched the fs when implementing each FS method. In other words, given that we need a FileSystem instance to do anything, the schema, host and port fields of the URI are usually redundant and force us to perform error checking. However these same fields would be useful when specifying MapReduce input and output directories, in command lines, etc., permitting one to easily specify non-default FileSystem implementations. Note that I don't think URI buys us interoperability with other systems. So we should only use it if we think it will make writing Hadoop easier: if it consists of code that we'd need to mostly need to write anyway. A side-benefit of URI is that it provides standards-defined filename syntax. We don't have to figure out how to, e.g., escape things, or how backslashes and colons should be treated, etc. We can simply point to a standard. > I also propose that this class should be versioned, and contain some File-like metadata - for now I'm thinking specifically about creation / modification time. This works so long as files are write-once. But if they can be appended to or overwritten then this information could get stale. > FileSystem should not name files with java.io.File > -------------------------------------------------- > > Key: HADOOP-129 > URL: http://issues.apache.org/jira/browse/HADOOP-129 > Project: Hadoop > Type: Improvement > Components: fs > Versions: 0.1.1, 0.1.0 > Reporter: Doug Cutting > Fix For: 0.2 > > In Hadoop's FileSystem API, files are currently named using java.io.File. This is confusing, as many methods on that class are inappropriate to call on Hadoop paths. For example, calling isDirectory(), exists(), etc. on a java.io.File is not the same as calling FileSystem.isDirectory() or FileSystem.exists() passing that same file. Using java.io.File also makes correct operation on Windows difficult, since java.io.File operates differently on Windows in order to accomodate Windows path names. For example, new File("/foo") is not absolute on Windows, and prints its path as "\\foo", which causes confusion. > To fix this we could replace the uses of java.io.File in the FileSystem API with String, a new FileName class, or perhaps java.net.URI. The advantage of URI is that it can also naturally include the namenode host and port. The disadvantage is that URI does not support tree operations like getParent(). > This change will cause a lot of incompatibility. Thus it should probably be made early in a development cycle in order to maximize the time for folks to adapt to it. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira