hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
Date Mon, 21 Nov 2016 23:26:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685075#comment-15685075

Zhe Zhang commented on HDFS-10702:

Thanks for the discussion Ming, Sean, Andrew.

bq. Refreshing the metadata for a table or partition is a very RPC heavy operation. This is
typically done when some new data has been written to HDFS. So, an ingest application would
write the data, call getSyncInfo, then refresh metadata using the txid from getSyncInfo.
Agreed that in this use case, the designed approach should work. But is do Hive/Impala usually
have several seconds of delay before ingestion and querying? Actually a more common use case
for us is where data ingestion and consumption belong to different apps. I guess in that use
case, the ingestion app should send the txID to the consumer?

bq. For apps that do not cache input streams, they can call getSyncInfo at job submission
time, then pass this to the job's tasks. Since a couple seconds typically passes between submission
and execution, we should be able to offload a lot from the SbNN.
This is also a good use case. _Acquiring syncInfo_ will become a standard operation for a
job startup (done by workflow managers like Oozie or Azkaban), similar to acquiring delegation
token from NN.

> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>                 Key: HDFS-10702
>                 URL: https://issues.apache.org/jira/browse/HDFS-10702
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jiayi Zhou
>            Assignee: Jiayi Zhou
>            Priority: Minor
>         Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, HDFS-10702.003.patch,
HDFS-10702.004.patch, HDFS-10702.005.patch, HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
> Currently, clients must always talk to the active NameNode when performing any metadata
operation, which means active NameNode could be a bottleneck for scalability. One way to solve
this problem is to send read-only operations to Standby NameNode. The disadvantage is that
it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from Standby which
gives Client the power to set the staleness restriction.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message