hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected
Date Thu, 07 Nov 2013 03:21:17 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Liang Xie updated HDFS-5461:

    Attachment: HDFS-5461.txt

bq.  It's because each open stream holds a buffer, and we have hundreds of open streams?
i am not 100% sure, but in my mind, i agree with you,  this oom is easy to repro while we
have lots of opened storefiles to be read(e.g. compaction can't catch up sometimes)

Oh, i see, seems the fallback only meaningful for some config like mine:  big Xmx and small
MaxDirectMemorySize :)

I attached a patch with more logging about using/pooled direct buffer size. In my option,
it could be useful probably while online resetting the log level to "trace"  during OOM occur.
 And add a simple try/catch fallback handle for OOM without introducing any config value,
per me, seems this way is more reasonable:)

> fallback to non-ssr(local short circuit reads) while oom detected
> -----------------------------------------------------------------
>                 Key: HDFS-5461
>                 URL: https://issues.apache.org/jira/browse/HDFS-5461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>         Attachments: HDFS-5461.txt
> Currently, the DirectBufferPool used by ssr feature seems doesn't have a upper-bound
limit except DirectMemory VM option. So there's a risk to encounter direct memory oom. see
HBASE-8143 for example.
> IMHO, maybe we could improve it a bit:
> 1) detect OOM or reach a setting up-limit from caller, then fallback to non-ssr
> 2) add a new metric about current raw consumed direct memory size.

This message was sent by Atlassian JIRA

View raw message