hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8870) Lease is leaked on write failure
Date Tue, 18 Aug 2015 19:49:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701862#comment-14701862
] 

Daryn Sharp commented on HDFS-8870:
-----------------------------------

A combination of errors occurred.  Pipelines were frequently breaking because the cluster
erroneously "thought" it was full.  Mis-accounting bugs in the RBW reserved space and storage
report contributed to the problem but almost full clusters will exhibit the same problems.
 A thread leaks and continues to renew the lease on a defunct file.

Didn't seem like a big deal until we saw it in long running daemons.  Then it was the NMs.
 Consider log aggregation pipelines breaking, NMs leaking dozens or hundreds of renewer threads,
over thousands of nodes, NN has an insane number of open connections nearing your "this will
never happen" fd limit, clogging it with worthless renewals.  Now it gets good.  The renewer
threads won't abort until the token expires.  Oh, you don't have security enabled?  Better
restart your NMs, hdfs proxies, oozies, DNs (webhdfs writes), hbase region servers, etc...

I'm swamped and if you want to wait till 2.6.2, I'm ok.

> Lease is leaked on write failure
> --------------------------------
>
>                 Key: HDFS-8870
>                 URL: https://issues.apache.org/jira/browse/HDFS-8870
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: HDFS
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Daryn Sharp
>
> Creating this ticket on behalf of [~daryn]
> We've seen this in our of our cluster. When a long running process has a write failure,
the lease is leaked and gets renewed until the token is expired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message