hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
Date Wed, 17 Feb 2016 13:50:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150522#comment-15150522
] 

Steve Loughran commented on HADOOP-9565:
----------------------------------------


I've been thinking about this, and wondering if we could have better extensibility by providing
a lookup operation where you asked for the specific method and got back an enum of values:

{code}

getOperationSemantics("create") = <SUPPORTED, O_1, CONSISTENT, ATOMIC, SYNCHRONOUS, PERSISTENT>
  // HDDA change is visible, operation with check for existence is atomic
getOperationSemantics("create") = <SUPPORTED, O_1>   // s3

getOperationSemantics("append") = <SUPPORTED, PERSISTENT>   // HDFS: supported but no
consistency guarantees
getOperationSemantics("append") = <>   // s3 doesn't support append

getOperationSemantics("delete") = <SUPPORTED, O_1, CONSISTENT, ATOMIC, SYNCHRONOUS, PERSISTENT>
  // HDFS: supported but no consistency guarantees
getOperationSemantics("delete") = <SUPPORTED, O_N, PERSISTENT>   // maybe async

getOperationSemantics("rename") = <SUPPORTED, O_1, CONSISTENT, ATOMIC, SYNCHRONOUS, PERSISTENT>
// hdfs
getOperationSemantics("rename") = <SUPPORTED, CLIENT_SIDE, O_N, SYNCHRONOUS, PERSISTENT>
// s3

getOperationSemantics("OutputStream.close") = <SUPPORTED, O_N, ATOMIC, SYNCHRONOUS, PERSISTENT>
 // s3
getOperationSemantics("OutputStream.close") = <SUPPORTED, O_1, ATOMIC, CONSISTENT, SYNCHRONOUS,
PERSISTENT>  // HDFS
getOperationSemantics("OutputStream.write") = <SUPPORTED, O_N, PERSISTENT>  // HDFS
getOperationSemantics("OutputStream.write") = <SUPPORTED, O_1>  // s3

getOperationSemantics("OutputStream.flush") = <SUPPORTED, O_N, SYNCHRONOUS, PERSISTENT>
 // HDFS
getOperationSemantics("OutputStream.flush") = <SUPPORTED, NO_OP, O_1>  // s3 won't fail
on the cal, but it doesn't do anything


{code}

I know it's potentially much more complex, especially for clients, but it does expose all
the information apps may possibly need.

Example: dfsclient & can look for rename being 0_1 and !CLIENT_SIDE; if not, it bypasses
rename and writes direct.

another example, some code trying to use create(overwrite=false) for locking could check and
fail if "create" wasn't atomic/persistent (i.e. check & create atomic, result visible
to all)



> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
>                 Key: HADOOP-9565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9565
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3, fs/swift
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>            Assignee: Thomas Demoor
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch,
HADOOP-9565-004.patch, HADOOP-9565-005.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really blobstores,
with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to
add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores
implement at server-side copy operation as a substitute for rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message