beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Cwik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2283) Consider using actual URIs instead of Strings/ResourceIds in relation to FileSystems
Date Fri, 26 May 2017 18:05:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026609#comment-16026609
] 

Luke Cwik commented on BEAM-2283:
---------------------------------

Michael Lucky / adude3141@gmail.com reported the following about how globs are treated differently:

Local:
{code}
baseDir/someDir: [Success{status=OK, getMetadata=[Metadata{resourceId=/baseDir/someDir/, sizeBytes=102,
isReadSeekEfficient=true}]}] 
baseDir/someDir/: [Success{status=OK, getMetadata=[Metadata{resourceId=/baseDir/someDir/,
sizeBytes=102, isReadSeekEfficient=true}]}] 
baseDir/someDir/testFileAA: [Success{status=OK, getMetadata=[Metadata{resourceId=/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}] 
baseDir/someDir/testFile*: [Success{status=OK, getMetadata=[Metadata{resourceId=/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}] 
baseDir/*: [Failure{status=NOT_FOUND, getException=java.io.FileNotFoundException: No files
found for spec: /baseDir/*.}] 
baseDir/*/: [Failure{status=NOT_FOUND, getException=java.io.FileNotFoundException: No files
found for spec: /baseDir/*.}] 
baseDir/*/testFileAA: [Success{status=NOT_FOUND, getMetadata=[]}] 
baseDir/**: [Success{status=OK, getMetadata=[Metadata{resourceId=/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}] 
baseDir/*/*: [Success{status=NOT_FOUND, getMetadata=[]}] 
baseDir/*/testFile*: [Success{status=NOT_FOUND, getMetadata=[]}] 
{code}

On HDFS:
{code}
baseDir/someDir: [Success{status=OK, getMetadata=[]}]
baseDir/someDir/: [Success{status=OK, getMetadata=[]}]
baseDir/someDir/testFileAA: [Success{status=OK, getMetadata=[Metadata{resourceId=hdfs://localhost:58835/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}]
baseDir/someDir/testFile*: [Success{status=OK, getMetadata=[Metadata{resourceId=hdfs://localhost:58835/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}]
baseDir/*: [Success{status=OK, getMetadata=[]}]
baseDir/*/: [Success{status=OK, getMetadata=[]}]
baseDir/*/testFileAA: [Success{status=OK, getMetadata=[Metadata{resourceId=hdfs://localhost:58835/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}]
baseDir/**: [Success{status=OK, getMetadata=[]}]
baseDir/*/*: [Success{status=OK, getMetadata=[Metadata{resourceId=hdfs://localhost:58835/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}]
baseDir/*/testFile*: [Success{status=OK, getMetadata=[Metadata{resourceId=hdfs://localhost:58835/baseDir/someDir/testFileAA,
sizeBytes=10, isReadSeekEfficient=true}]}]
{code}

> Consider using actual URIs instead of Strings/ResourceIds in relation to FileSystems
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-2283
>                 URL: https://issues.apache.org/jira/browse/BEAM-2283
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core, sdk-java-extensions, sdk-java-gcp, sdk-py
>            Reporter: Luke Cwik
>
> We treat things like URIs because we expect them to have a scheme component and to be
able to resolve a parent/child but fail to treat them as URIs in the internal implementation
since our string versions don't go through URI normalization. This brings up a few issues:
> * The cost of implementing and maintaining ResourceIds instead of having users use a
standard URI implementation. This would just require FileSystems to be able to take a string
and give back a URI (to enable them to have custom implementations in case they extend the
concept of URIs with scheme specific extensions).
> * The myriad of bugs that will come up because of improper usage of URI like strings
and the assumptions associated with them (like https://issues.apache.org/jira/browse/BEAM-2277)
> Note that swapping to URIs adds complexity because:
> * Resolving URIs with glob expressions needs to be handled carefully
> * FileSystems may need to implement a complicated type instead of ResourceId.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message