hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
Date Tue, 19 Sep 2017 15:37:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171883#comment-16171883
] 

Sean Mackrory edited comment on HDFS-10702 at 9/19/17 3:36 PM:
---------------------------------------------------------------

The assumption of this feature is that an application is responsible for knowing when a dataset
is stable enough to work on, and that any failures or inaccuracies resulting in stuff that
happens after the minimum transaction ID is assumed by the application. There are obviously
case where that's not reasonable, but like I said above, this isn't intended for every situation.
That said, I'd be all for testing the sequence you described to verify exactly how it fails
and that it doesn't bring all of HDFS down with it - just the client. But if a file is deleted
after the specified transaction ID and the application tries to access it, returning an exception
would be the correct behavior, IMO.

I was actually wondering if what you meant was the block locations were out of date because
the file had been re-replicated in a different configuration due to cluster health issues,
or decommissioning. Cluster state is distinct from an application knowing when it's safe to
assume that a dataset is finalized, so that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers responsibility for
knowing that a dataset is complete to the client application and we test the accessing a deleted
file fails in a correct manner, would that address your concerns, [~mingma]?


was (Author: mackrorysd):
The assumption of this feature is that an application is responsible for knowing when a dataset
is stable enough to work on, and that any failures or inaccuracies resulting in stuff that
happens after the minimum transaction ID is assumed by the application. That said, I'd be
all for testing the scenario above to verify exactly how it fails and that it doesn't bring
all of HDFS down with it - just the client. But if file is deleted after the specified transaction
and the application tries to access it, returning an exception would be the correct behavior.

I was actually wondering if what you meant was the block locations were out of date because
the file had been re-replicated in a different configuration due to cluster health issues,
or decommissioning. Cluster state is distinct from an application knowing when it's safe to
assume that a dataset is finalized, so that complicates the assumption somewhat.

But if it's just a clearly stated assumption that this feature transfers reponsibility for
knowing that a dataset is complete to the client application and we test the accessing a deleted
file fails in a correct manner, would that address your concerns, [~mingma]?

> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>
>                 Key: HDFS-10702
>                 URL: https://issues.apache.org/jira/browse/HDFS-10702
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jiayi Zhou
>            Assignee: Sean Mackrory
>            Priority: Minor
>         Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, HDFS-10702.003.patch,
HDFS-10702.004.patch, HDFS-10702.005.patch, HDFS-10702.006.patch, HDFS-10702.007.patch, HDFS-10702.008.patch,
StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing any metadata
operation, which means active NameNode could be a bottleneck for scalability. One way to solve
this problem is to send read-only operations to Standby NameNode. The disadvantage is that
it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from Standby which
gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message