hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cyril Briquet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-246) Add a method to get file length for Seekable, FSDataInputStream and libhdfs
Date Tue, 03 May 2011 11:59:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028171#comment-13028171
] 

Cyril Briquet commented on HDFS-246:
------------------------------------

Another related use case:

Let's assume the intent to implement two (somewhat complex) record reading routines,
one for an HDFS filesystem (through the HDFS API),
the other for a local filesystem (i.e. through the java.io API).
(In practice, this use case is extended to support more than two filesystems,
but let's assume two for the sake of simplicity.)

To promote code reuse, the core of the two implementations
can be abstracted into a base class.
This base class is inherited by two subclasses
that provide the concrete implementation of low-level I/O.

The low-level I/O rely on:

org.apache.hadoop.fs.FSDataInputStream:

public void seek(long pos) throws IOException; // org.apache.hadoop.fs.Seekable interface
public long getPos() throws IOException; // org.apache.hadoop.fs.Seekable interface
public long length() throws IOException; // TODO
public String readUTF() throws IOException; // java.io.DataInput interface
public int readInt() throws IOException;  // java.io.DataInput interface
public void close() throws IOException; // java.io.Closeable

java.io.RandomAccessFile:

public void seek(long pos) throws IOException; // no interface
public long getFilePointer() throws IOException; // no interface
public long length() throws IOException; // no interface
public String readUTF() throws IOException; // java.io.DataInput interface
public int readInt() throws IOException;  // java.io.DataInput interface
public void close() throws IOException; // java.io.Closeable

When considering this use case, the patch proposed by Qi makes a lot of sense (to me, at least
:)
This would bring to FSDataInputStream the same semantics
that are available from RandomAccessFile.


> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-246
>                 URL: https://issues.apache.org/jira/browse/HDFS-246
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Qi Liu
>            Assignee: Qi Liu
>         Attachments: HADOOP-5143-2.patch, HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable
interface, since the seek method should be able to detect seeking beyond the end of file.
Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen()
for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage
of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be
considered trivial, since some of the FSInputStream classes already have a method named getFileLength(),
or a member field named size/length/end.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message