hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Hansen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
Date Wed, 24 Aug 2016 22:10:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435809#comment-15435809
] 

Bob Hansen edited comment on HDFS-10754 at 8/24/16 10:10 PM:
-------------------------------------------------------------

Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep dragging you back,
but...

The new recursive methods should not use future/promises internally.  That blocks one of the
asio threads waiting for more data; if a consumer tried to do one of these with a single thread
in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd all
get queued up behind the initial handler.

Instead, whenever they're all done (request_count == 0), the last one out the door (the handler
that dropped the request_count to 0) should call into the consumer's handler directly with
the final status.  If any of the other threads has received an error, all of the subsequent
deliveries to the handler should return false, telling find "I'm going to report an error
anyway, so don't bother recursing any more."  It's good to wait until request_count==0, even
in an error state, so the consumer doesn't have any lame duck requests queued up to take care
of?

Also because all of this is asynchronous, you can't allocate the lock and state variables
on the stack.  When the consumer calls SetOwner('/', true, handler), the function is going
to return as soon as the find operation is kicked off, destroying all of the elements on the
stack.  We'll need to create a little struct for SetOwner that is maintained with a shared_ptr
and cleaned up when the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's more outstanding_requests,
since for big directories, we'll have more requests without recursing.

Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better
than just SharedState (since we might have many shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" "\*.db", and not
have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the itnerface (even
if private)?

We're very close, now.



was (Author: bobhansen):
Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep dragging you back,
but...

The new recursive methods should not use future/promises internally.  That blocks one of the
asio threads waiting for more data; if a consumer tried to do one of these with a single thread
in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd all
get queued up behind the initial handler.

Instead, whenever they're all done (request_count == 0), the last one out the door (the handler
that dropped the request_count to 0) should call into the consumer's handler directly with
the final status.  If any of the other threads has received an error, all of the subsequent
deliveries to the handler should return false, telling find "I'm going to report an error
anyway, so don't bother recursing any more."  It's good to wait until request_count==0, even
in an error state, so the consumer doesn't have any lame duck requests queued up to take care
of?

Also because all of this is asynchronous, you can't allocate the lock and state variables
on the stack.  When the consumer calls SetOwner('/', true, handler), the function is going
to return as soon as the find operation is kicked off, destroying all of the elements on the
stack.  We'll need to create a little struct for SetOwner that is maintained with a shared_ptr
and cleaned up when the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's more outstanding_requests,
since for big directories, we'll have more requests without recursing.

Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better
than just SharedState (since we might have many shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" "*.db", and not
have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the itnerface (even
if private)?

We're very close, now.


> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod
and hdfs_find.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10754
>                 URL: https://issues.apache.org/jira/browse/HDFS-10754
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10754.HDFS-8707.000.patch, HDFS-10754.HDFS-8707.001.patch,
HDFS-10754.HDFS-8707.002.patch, HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch,
HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, HDFS-10754.HDFS-8707.007.patch,
HDFS-10754.HDFS-8707.008.patch, HDFS-10754.HDFS-8707.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message