hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11909) Ozone: KSM : Support for simulated file system operations
Date Thu, 01 Jun 2017 22:38:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033835#comment-16033835

Mingliang Liu commented on HDFS-11909:

Thanks [~anu] for the design doc. Nice discussion here. I think [~stevel@apache.org] got pinged,
so he may chime in as well.

I am not sure how useful is that to differentiate file with dir in API, getSimulatedFiles
and getSimulatedDirectories. In file system they are all FileStatus. When we want to implement
listStatus, the most convenient API wanted would be a call simply returns all paths under
a path, along with an attribute to indicate itself a dir or a file. Otherwise we will end
up with calling 2 APIs here, less favored.
This makes sense. Alternatively one single API (maybe listKeys) returns a response containing
both top-level files and "directories". One motivation to separate the files/directories in
API level is to support millions of children. In that case the response will be truncated
by client side (using {{prev_key}} and {{max_keys}}), and one single response contains partial
value. We issue paging requests to get all the children of a directory. So saving the wasted
information seems a good idea if we only need files or directories. When I discussed with
Anu offline, one specific use case is non-recursive {{FileSystem::listFiles()}} API, which
needs top-level files only. That will save the effort of returning directories which are useless
anyway. For recursive {{FileSystem::listFiles()}}, we can simply use the {{getSimulatedTree}}
on demand at client side. We can think about other use cases which may benefit from the separation
API. If the single listKeys can work just fine for those cases, I'd prefer a single API as
well, which is indeed simple and favored.

FWIW, currently all the glob and path filter in HDFS is executed at client side. I think this
_maybe_ because we prefer less load on NameNode to saving network bandwidth for cases most
of the files and directories returned are filtered out. In an ideal word, there is server
side path filter. We can talk about this later if it's too crazy.

We certainly could that or for the purpose of these API we will treat one or more / as a single
slash, either one will work.
Have strict name convention is not a bad thing to me. It's clear without sacrificing important
use cases.

> Ozone: KSM :  Support for simulated file system operations
> ----------------------------------------------------------
>                 Key: HDFS-11909
>                 URL: https://issues.apache.org/jira/browse/HDFS-11909
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>         Attachments: simulation-file-system.pdf
> This JIRA adds a proposal that makes it easy to implement OzoneFileSystem. This allows
the directory and file list operations simpler.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message