hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8771) If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes
Date Tue, 22 Sep 2015 09:22:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902285#comment-14902285
] 

Vinayakumar B commented on HDFS-8771:
-------------------------------------

I think its a good idea to make purge asynchronous to unblock write requests.

Some comments about the patch.
1. {{void purgeDataOlderThan(final long minTxIdToKeep) throws IOException {}}
Here no exception will be thrown from this method now, so now can remove {{throws}}.
2. {{setUncaughtExceptionHandler(UncaughtExceptionHandlers.systemExit())}}
I think, shutting down entire JN on IOException during purge may not be good. During purge
only call which results in IOE is {{FileUtil.listFiles(dir)}}, which might be due to disk
error. Since this exception cannot be propogated back to NN, I feel it would be better to
handle inside {{call()}} and log a WARN. Let further synchronous write requests handle the
IOE as required. For any other exceptions let JN shutdown, its okay.

[~andrew.wang] / [~jingzhao], do you want to take a look here. ?

> If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another
RPC calls to Journalnodes
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8771
>                 URL: https://issues.apache.org/jira/browse/HDFS-8771
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Takuya Fukudome
>            Assignee: Kanaka Kumar Avvaru
>         Attachments: HDFS-8771-01.patch, HDFS-8771-02.patch, HDFS-8771-03.patch
>
>
> In our cluster, edits has became huge(about 50GB) accidentally and our Jounalnodes' disks
were busy, therefore {{purgeLogsOlderThan}} took more than 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}}
takes too much time, Namenode couldn't send other RPC calls to Journalnodes because {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s
executor is single thread. It will cause namenode shutting down.
> I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls like sendEdits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message