hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kannan Muthukkaruppan <Kan...@facebook.com>
Subject RE: scanner lease expired/region server shutdown
Date Tue, 26 Jan 2010 20:58:06 GMT
Dhruba: yes, the "Too many open files" exception is getting reported by the DN process. The
same node is also running an HBase region server.

And yes, I confirmed that the xcievers setting is 2048.

Regards,
Kannan
-----Original Message-----
From: Dhruba Borthakur [mailto:dhruba@gmail.com]
Sent: Tuesday, January 26, 2010 10:10 AM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

This exception is from the DataNode, right? This means that the datanode
process has 32K files open simultaneously, how can that be? For each block
read/write the datanode has two open files, one for the data and one for the
.meta where the crc gets stored.

On the other hand, the datanode is configured via dfs.datanode.max.xcievers
to support 2048 read/write request simultanously, right?

thanks,
dhruba


On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<Kannan@facebook.com>wrote:

>
> Looking further up in the logs (about 20 minutes prior in the logs when
> errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
> sufficient?
>
> 2010-01-25 11:10:21,774 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at
> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException:
> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
> (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [Kannan@facebook.com]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: hbase-dev@hadoop.apache.org
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
> the other three are running region servers. ZK is on all the same 5 nodes.
> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time
> the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /
> 10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
> 10.129.68.212:477\
> 29]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> [For future runs, will try to turn on Java GC logging as well as per your
> suggestion.]
>
> 3.  You wrote <<< There is no line after the above?  A region with a
> startkey of 0031841132?  >>>
>
> I just cut-paste a section of the PE log and the scan of the '.META.' from
> shell. But there were many such regions for which the PE client reported
> errors of the form:
>
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: Connection refused
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0015411453,99999999999999
> 10/01/25 12:31:37 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
>
> and scan of '.META.' revealed that those regions, such as the
> 'TestTable,0015411453,99999999999999', had splitA & splitB portions in META.
>
> <<< Then its a dropped edit of .META.  The parent region -- the one being
> split got updates which added the splitA and splitB and its offlining but
> the new region didn't get inserted?  The crash happened just at that
> time?>>>
>
> Didn't fully understand the above.
>
> <<< It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a
> recommended way of doing this?
>
> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
> stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Kannan@facebook.com> wrote:
> > I was doing some brute force testing - running one instance of
> PerformanceEvaluation (PE) for writes, and another instance for randomReads.
> >
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
> > One of the region servers went down after a while. [This is on 0.20.2].
> The region server log had things like:
> >
> > 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 34 on 60020 took 65190ms appending an edit to hlog;
> editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
> > 2010-01-25 11:33:39,416 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
> >  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689,
> Hit=9313612, Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit
> Ratio=85.40924191474\
> > 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 52 on 60020 took 65195ms appending an edit to hlog;
> editcount=27879
> > 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 75 on 60020 took 65195ms appending an edit to hlog;
> editcount=27880
> > 2010-01-25 11:33:39,417 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -5220035164735014996 lease expired
> >
> > This was followed by:
> > 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708
> <mailto:sun.nio.ch.SelectionKeyImpl@788ab708>
> > java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
> >
> > And also errors like:
> > 2010-01-25 11:33:39,692 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> -5220035164735014996
> >
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
> > Subsequently, I restarted the region server and just started the PE in
> sequentialWrite mode. It seems to be off to a really slow start. For several
> regions, it is printing the following:
> >
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 1000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> > [..and so on..]
> > 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached
> location for .META.,,1 is 10.129.68.212:60020
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers:
> locateRegionInMeta attempt 8 of 10 failed; retrying after sleep of 16000
> because: region offline: TestTable,0031841132,1264121653303
> > 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed
> .META.,,1 for tableName=.META. from cache because of
> TestTable,0032000000,99999999999999
> >
> > When I scan the '.META.', I noticed that most of the regions for which I
> get the above error were in the middle of (?) a split. For example:
> >
> > TestTable,0031841132,126412 column=info:regioninfo,
> timestamp=1264470099796, value=REGION => {NAME => 'TestTa
> >  1653303                     ble,0031841132,1264121653303', STARTKEY =>
> '0031841132', ENDKEY => '0032057774',
> >                             ENCODED => 644500792, OFFLINE => true, SPLIT
> => true, TABLE => {{NAME => 'TestTab
> >                             le', FAMILIES => [{NAME => 'info', VERSIONS
> => '3', COMPRESSION => 'NONE', TTL =>
> >                              '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]
> >                             }}
> >  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148,
> value=10.129.68.214:60020
> >  1653303
> >  TestTable,0031841132,126412 column=info:serverstartcode,
> timestamp=1264121655148, value=1264109117245
> >  1653303
> >  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796,
> value=\000\n0031978284\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
> >                             5\337\274#
> >  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796,
> value=\000\n0032057774\000\000\000\0
> >  1653303
> 01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
> >
> 000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
> >
> 05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
> >
> 00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
> >
> 00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
> >
> 000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
> >
> EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
> >                             75\312\262
> >
> > Any thoughts/ideas on what might be going on? Appreciate any input in
> helping understand what might be going on.
> >
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
> > Regards,
> > Kannan
> >
> >
> >
>



--
Connect to me at http://www.facebook.com/dhruba

Mime
View raw message