hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11334) [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator for same file blocks
Date Tue, 18 Apr 2017 08:25:41 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rakesh R updated HDFS-11334:
----------------------------
    Attachment: HDFS-11334-HDFS-10285-04.patch

Attached another patch fixing the related checkstyle warning only.

Note: {{(max allowed is 150)}} checkstyle warning is an existing case and not fixing this
in my patch.

> [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator
for same file blocks
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11334
>                 URL: https://issues.apache.org/jira/browse/HDFS-11334
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Rakesh R
>             Fix For: HDFS-10285
>
>         Attachments: HDFS-11334-HDFS-10285-00.patch, HDFS-11334-HDFS-10285-01.patch,
HDFS-11334-HDFS-10285-02.patch, HDFS-11334-HDFS-10285-03.patch, HDFS-11334-HDFS-10285-04.patch
>
>
> I am summarizing the scenarios here what Rakesh and me discussed offline:
> Here we need to handle couple of cases:
> # NN switch - it will freshly start scheduling for all files.
>        At this time, old co-ordinators may continue movement work and send results back.
This could confuse NN SPS that which result is right one.
>   *NEED TO HANDLE*
> # DN disconnected for heartbeat expiry - If DN disconnected for long time(more than heartbeat
expiry), NN will remove this nodes. After SPS Monitor time out, it may retry for files which
were scheduled to that DN, by finding new co-ordinator. But if it reconnects back after NN
reschedules, it may lead to get different results from deferent co-ordinators.
> *NEED TO HANDLE*
> # NN Restart- Should be same as point 1
> # DN disconnect - here When DN disconnected simply and reconnected immediately (before
heartbeat expiry), there should not any issues
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing
> # DN Restart- If DN restarted, DN can not send any results as it will loose everything.
After NN SPS Monitor timeout, it will retry.
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message