hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: File descriptor leak, possibly new in CDH5.7.0
Date Mon, 23 May 2016 20:07:22 GMT
How hard to change the below if only temporarily (Trying to get a datapoint
or two to act on; the short circuit code hasn't changed that we know of...
perhaps the scan chunking facility in 1.1 has some side effect we've not
noticed up to this).

If you up the caching to be bigger does it lower the rate of FD leak
creation?

If you cache the blocks, assuming it does not blow the cache for others,
does that make a difference.

Hang on... will be back in a sec... just sending this in meantime...

St.Ack

On Mon, May 23, 2016 at 12:20 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> For reference, the Scan backing the job is pretty basic:
>
> Scan scan = new Scan();
> scan.setCaching(500); // probably too small for the datasize we're dealing
> with
> scan.setCacheBlocks(false);
> scan.setScanMetricsEnabled(true);
> scan.setMaxVersions(1);
> scan.setTimeRange(startTime, stopTime);
>
> Otherwise it is using the out-of-the-box TableInputFormat.
>
>
>
> On Mon, May 23, 2016 at 3:13 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > I've forced the issue to happen again. netstat takes a while to run on
> > this host while it's happening, but I do not see an abnormal amount of
> > CLOSE_WAIT (compared to other hosts).
> >
> > I forced more than usual number of regions for the affected table onto
> the
> > host to speed up the process.  File Descriptors are now growing quite
> > rapidly, about 8-10 per second.
> >
> > This is what lsof looks like, multiplied by a couple thousand:
> >
> > COMMAND   PID  USER   FD      TYPE             DEVICE    SIZE/OFF
> > NODE NAME
> > java    23180 hbase  DEL    REG               0,16             3848784656
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1702253823
> > java    23180 hbase  DEL    REG               0,16             3847643924
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1614925966
> > java    23180 hbase  DEL    REG               0,16             3847614191
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_888427288
> >
> > The only thing that varies is the last int on the end.
> >
> > > Anything about the job itself that is holding open references or
> > throwing away files w/o closing them?
> >
> > The MR job does a TableMapper directly against HBase, which as far as I
> > know uses the HBase RPC and does not hit HDFS directly at all. Is it
> > possible that a long running scan (one with many, many next() calls)
> could
> > keep some references to HDFS open for the duration of the overall scan?
> >
> >
> > On Mon, May 23, 2016 at 2:19 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> >> We run MR against many tables in all of our clusters, they mostly have
> >> similar schema definitions though vary in terms of key length, #
> columns,
> >> etc. This is the only cluster and only table we've seen leak so far.
> It's
> >> probably the table with the biggest regions which we MR against, though
> >> it's hard to verify that (anyone in engineering can run such a job).
> >>
> >> dfs.client.read.shortcircuit.streams.cache.size = 256
> >>
> >> Our typical FD amount is around 3000. When this hadoop job runs, that
> >> can climb up to our limit of over 30k if we don't act -- it is a gradual
> >> build up over the course of a couple hours. When we move the regions
> off or
> >> kill the job, the FDs will gradually go back down at roughly the same
> pace.
> >> It forms a graph in the shape of a pyramid.
> >>
> >> We don't use CM, we use mostly the default *-site.xml. We haven't
> >> overridden anything related to this. The configs between CDH5.3.8 and
> 5.7.0
> >> are identical for us.
> >>
> >> On Mon, May 23, 2016 at 2:03 PM Stack <stack@duboce.net> wrote:
> >>
> >>> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
> >>> bbeaudreault@hubspot.com
> >>> > wrote:
> >>>
> >>> > Hey everyone,
> >>> >
> >>> > We are noticing a file descriptor leak that is only affecting nodes
> in
> >>> our
> >>> > cluster running 5.7.0, not those still running 5.3.8.
> >>>
> >>>
> >>> Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs
> >>> hbase-0.98.6+hadoop-2.5.0.
> >>>
> >>>
> >>> > I ran an lsof against
> >>> > an affected regionserver, and noticed that there were 10k+ unix
> sockets
> >>> > that are just called "socket", as well as another 10k+ of the form
> >>> >
> >>>
> "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-<int>_1_<int>". The
> >>> > 2 seem related based on how closely the counts match.
> >>> >
> >>> > We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0
> (we
> >>> > handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
> >>> > experience this issue. The 5.7.0 nodes *do. *We are holding off
> >>> upgrading
> >>> > more regionservers until we can figure this out. I'm not sure if any
> >>> > intermediate versions between the 2 have the issue.
> >>> >
> >>> > We traced the root cause to a hadoop job running against a basic
> table:
> >>> >
> >>> > 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
> >>> > MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS =>
'50',
> >>> > BLOOMFILTER => 'NONE', COMPRESSION => 'LZO', METADATA =>
> >>> > {'COMPRESSION_COMPACT' => 'LZO', 'ENCODE_ON_DISK' => 'true'}}
> >>> >
> >>> > This is very similar to all of our other tables (we have many).
> >>>
> >>>
> >>> You are doing MR against some of these also? They have different
> schemas?
> >>> No leaks here?
> >>>
> >>>
> >>>
> >>> > However,
> >>> > it's regions are getting up there in size, 40+gb per region,
> >>> compressed.
> >>> > This has not been an issue for us previously.
> >>> >
> >>> > The hadoop job is a simple TableMapper job with no special
> parameters,
> >>> > though we haven't updated our client yet to the latest (will do that
> >>> once
> >>> > we finish the server side). The hadoop job runs on a separate hadoop
> >>> > cluster, remotely accessing the HBase cluster. It does not do any
> other
> >>> > reads or writes, outside of the TableMapper scans.
> >>> >
> >>> > Moving the regions off of an affected server, or killing the hadoop
> >>> job,
> >>> > causes the file descriptors to gradually go back down to normal.
> >>> >
> >>> >
> >>> Any ideas?
> >>> >
> >>> >
> >>> Is it just the FD cache running 'normally'? 10k seems like a lot
> though.
> >>> 256 seems to be the default in hdfs but maybe it is different in CM or
> in
> >>> hbase?
> >>>
> >>> What is your dfs.client.read.shortcircuit.streams.cache.size set to?
> >>> St.Ack
> >>>
> >>>
> >>>
> >>> > Thanks,
> >>> >
> >>> > Bryan
> >>> >
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message