hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
Date Thu, 18 Apr 2013 14:51:17 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635220#comment-13635220
] 

Steve Loughran commented on HADOOP-9371:
----------------------------------------

We also need to specify {{Seekable}}, as the {{FSDataInputStream}} which must be returned
from {{open()}} calls implement it, and the specifics of {{seek(long pos)}} are not completely
defined, consistently implemented, or explicitly tested.

* some implementation classes validate the range of a seek in the call; it can also be postponed
until the next read() (which is how Posix expects it).
* Not everything rejects negative seek offsets
* While {{EOFException}} would be the appropriate exception to raise on going past the end
of the file, it is rarely to be seen in the source.

Delayed seeks can deliver tangible performance benefits and it would be unwise to demand stricter
validation than {{::lseek()}} or {{::SetFilePointerEx()}}. We ought to say "you can if you
want", and write tests that verify either the seek fails, or the read straight afterwards
fails. 

== Seekable ==

* When a file is opened, {{getPos()}} MUST equal 0
* Implementations MAY NOT implement {{seek()}}, and instead MAY throw an {{IOException}}
* A {{seek(L)}} on a closed input stream MUST fail with an {{IOException}}.
* After a successful {{seek(L)}}, {{getPos()==L}} for all L:  {{0 =< L < length(file)}}
* On a {{seek(L)}} with L<0 an MUST be thrown. It SHOULD be an {{IOException}}. It MAY
be {{IllegalArgumentException}} or other {{RuntimeException}}
* On a {{seek(L)}} with L>length(file), an {{IOException}} MAY be thrown. It SHOULD be
an {{EndOfFileException}}
* If an {{IOException}} is not thrown, then an {{IOException}} MUST be thrown on the next
{{read()}} operation. It SHOULD be an {{EndOfFileException}} 


This is actually a relaxation of the {{Seekable.seek()}} definition, which states "Can't seek
past the end of the file.". The {{RawLocalFileSystem}} on which everything ultimately depends
does support seeking past the end of the file -it is only on the read operation where an exception
is raised.

* After a {{seek(L)}} with {{L<length(file)}}, {{read()}} returns the byte at position
L in the file.
* After a {{seek(L)}} with {{L==length(file)}}, {{read()}} returns -1
* After a {{seek(L)}} with {{L==length(file)}}, {{read(byte[1],0,1)}} returns the byte at
position L in the file.

Tests to verify offset validation
# open a file of length {{file_len > 0}}, verify {{getPos()==0}}
# {{seek(file_len)}}, verify {{getPos()==file_len}}
 If an exception is not raised, read() and expect an {{IOException}} exception 
# {{seek(file_len+1)}}, expect an {{EOFException}}
 If an exception is not raised, read() and expect the exception then
# seek(-1), expect an {{IOException}} immediately.

open a file of length {{file_len == 0}}
 # verify {{getPos()==0}}
 # Verify that {{seek(0)}} succeeds.
 # verify that {{read()}} returns -1.

Test to verify {{seek()}} actually changes the location for future reads.
* verify that after a {{seek()}}, {{read()}} returns the data at the seek location. This must
work for forward and backwards seeks.
* verify that after a {{seek()}}, a {{read(byte[])}} returns the bytes of data at the seek
location. This must work for forward and backwards seeks.]
Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems with local caches/buffers
handle longer range seeks correctly.

                
> Define Semantics of FileSystem and FileContext more rigorously
> --------------------------------------------------------------
>
>                 Key: HADOOP-9371
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9371
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 1.2.0, 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms
of 
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their outcomes and failure
modes.
> The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message