hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anatoli Shein (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
Date Thu, 25 Aug 2016 18:33:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437416#comment-15437416

Anatoli Shein commented on HDFS-10754:

Thank for the review, [~bobhansen].

I have addressed your comments as follows:

* The new recursive methods should not use future/promises internally. That blocks one of
the asio threads waiting for more data; if a consumer tried to do one of these with a single
thread in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd
all get queued up behind the initial handler.

Instead, whenever they're all done (request_count == 0), the last one out the door (the handler
that dropped the request_count to 0) should call into the consumer's handler directly with
the final status. If any of the other threads has received an error, all of the subsequent
deliveries to the handler should return false, telling find "I'm going to report an error
anyway, so don't bother recursing any more." It's good to wait until request_count==0, even
in an error state, so the consumer doesn't have any lame duck requests queued up to take care

Also because all of this is asynchronous, you can't allocate the lock and state variables
on the stack. When the consumer calls SetOwner('/', true, handler), the function is going
to return as soon as the find operation is kicked off, destroying all of the elements on the
stack. We'll need to create a little struct for SetOwner that is maintained with a shared_ptr
and cleaned up when the last request is done.

(/) I removed futures and promises from recursive methods, and created a small struct to keep
the state. Now it is purely async.

Minor points:

* In find, perhaps recursion_counter is a bit of a misnomer at this point. It's more outstanding_requests,
since for big directories, we'll have more requests without recursing.
(/) Done.

* Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better
than just SharedState (since we might have many shared states in the FileSystem class).
(/) Done.

* In CurrentState, perhaps "depth" is more accurate than position?
(/) Yes, I changed it to "depth" now.

* Do we support a globbing find without recursion? Can I find "/dir?/path*/" "*.db", and not
have it recurse to the sub-directories of path*?
(/) POSIX find supports globbing without recursion by setting maxdepth to zero. I added maxdepth
functionality for our tool also.

* Can we push the shims and state into the .cpp file and keep them out of the itnerface (even
if private)?
(/) Done.

> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod
and hdfs_find.
> -----------------------------------------------------------------------------------------------------------
>                 Key: HDFS-10754
>                 URL: https://issues.apache.org/jira/browse/HDFS-10754
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10754.HDFS-8707.000.patch, HDFS-10754.HDFS-8707.001.patch,
HDFS-10754.HDFS-8707.002.patch, HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch,
HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, HDFS-10754.HDFS-8707.007.patch,
HDFS-10754.HDFS-8707.008.patch, HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message