hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 茅旭峰 <m9s...@gmail.com>
Subject Re: One of the regionserver aborted, then the master shut down itself
Date Wed, 16 Mar 2011 15:42:56 GMT
Thanks Ted!

===
Once a region is offline, it is removed from regions
===
By 'offline' here you mean unassigned, and has already been split into
smaller regions?
I think we have too many regions because we're using large cells, and
normally a region
is size of hundards of mega bytes. BTW, any property can set the size of a
region?
Do you think set larger region could helpful for our scenario? If
AssignmentManager.regions
holds all the online regions, the size of regions is
(number of online regions) X (number of online regions) / (number of region
servers), right?
So to cut the size of regions, either we can increase the region size, or
add more region servers,
right?

Just out of curiosity, why should we keep per region load for
each HServerLoad for
AssignmentManager.regions, I guess it keeps changing dynamically.

Thanks and regards,

Mao Xu-Feng

On Wed, Mar 16, 2011 at 11:03 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Thanks for your analysis.
> Once a region is offline, it is removed from regions
>
> BTW your cluster needs more machines. 7600 regions over 4 nodes place too
> much load on the servers.
>
> On Wed, Mar 16, 2011 at 4:28 AM, 茅旭峰 <m9suns@gmail.com> wrote:
>
> > Regarding AssignmentManager, it looks like only hold regions in
> transition.
> > We can see lots of region split and unsignment in the master log. I guess
> > it was due to our large cells and the endless insertion. Does this make
> > sense?
> > I have not dig into the code, I do belive it removes the regions from the
> > AssignmentManager.regions once the transition completes, right?
> >
> > Mao Xu-Feng
> >
> > On Wed, Mar 16, 2011 at 7:09 PM, 茅旭峰 <m9suns@gmail.com> wrote:
> >
> > > Hi J-D,
> > >
> > > Thanks for your reply.
> > >
> > > You said,
> > > ==
> > >
> > > Just as an example, every value that
> > > you insert first has to be copied from the socket before it can be
> > > inserted into the MemStore.  If you are using a big write buffer, that
> > > means that every insert currently in flight in a region server takes
> > > double that amount of space.
> > > ==
> > >
> > > How can I control the size of write buffer? I find a property
> > > 'hbase.client.write.buffer' in hbase-default.xml, do you mean this one?
> > > We use RESTful api to put our cells, hopefully, this would not make
> > > any difference.
> > >
> > > As for the memroy usage of the master, I did a further investigation
> > today.
> > > What I was doing was keeping putting cells as before. As I said
> > yesterday,
> > > the Java heap kept increasing accordingly, and eventually OOME happened
> > > as I expected. I set -Xmx to 1GB to speed up OOME.
> > >
> > > Then I used Eclipse Memory Analyzer to analyze the hprof file. It tells
> > > that
> > > most of the java heap is occupied by an instance of Class
> > AssignmentManager
> > >
> > > (For ease of reading, I think you can copy the result part to what ever
> > > editor you like, at least it works for me.)
> > >
> > > Class
> > > Name
> > > | Shallow Heap | Retained Heap
> > >
> > >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > org.apache.hadoop.hbase.master.AssignmentManager @
> > > 0x7f01050d4c98
> > > |          112 |   974,967,592
> > > |- <class> class org.apache.hadoop.hbase.master.AssignmentManager @
> > > 0x7f013c21ebd0
> > > |            8 |             8
> > > |- master org.apache.hadoop.hbase.master.HMaster @ 0x7f01050521e0
> > > master-cloud135:60000 Busy Monitor, Thread
> > > |          328 |         3,000
> > > |- regionsInTransition java.util.concurrent.ConcurrentSkipListMap @
> > > 0x7f01050c1000
> > > |           88 |           296
> > > |- watcher org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher @
> > > 0x7f01051cce68
> > > |          136 |         1,720
> > > |- timeoutMonitor
> > > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor @
> > > 0x7f01052505a8  cloud135:60000.timeoutMonitor Thread|          208
> > > |           592
> > > |- zkTable org.apache.hadoop.hbase.zookeeper.ZKTable @
> > > 0x7f01052c0318
> > > |           32 |           400
> > > |- catalogTracker org.apache.hadoop.hbase.catalog.CatalogTracker @
> > > 0x7f01052c5fd0
> > > |           72 |           376
> > > |- serverManager org.apache.hadoop.hbase.master.ServerManager @
> > > 0x7f01052f0138
> > > |           80 |       932,000
> > > |- regionPlans java.util.TreeMap @
> > > 0x7f01052f01d8
> > > |           80 |           104
> > > |- servers java.util.TreeMap @
> > > 0x7f01052f0228
> > > |           80 |        75,128
> > > |- regions java.util.TreeMap @
> > > 0x7f01052f0278
> > > |           80 |   950,435,488
> > > |  |- <class> class java.util.TreeMap @ 0x7f013be45c30 System
> > > Class
> > > |           16 |            16
> > > |  |- root java.util.TreeMap$Entry @
> > > 0x7f010542b790
> > > |           64 |   950,435,408
> > > |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08 System
> > > Class
> > > |            0 |             0
> > > |  |  |- left java.util.TreeMap$Entry @
> > > 0x7f01053d34b0
> > > |           64 |   579,650,616
> > > |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> System
> > > Class                                                         |
> >  0
> > > |             0
> > > |  |  |  |- right java.util.TreeMap$Entry @
> > > 0x7f01053d34f0
> > > |           64 |   270,674,784
> > > |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> > > System Class
> > > |            0 |             0
> > > |  |  |  |  |- left java.util.TreeMap$Entry @
> > > 0x7f01053c7568
> > > |           64 |   162,321,936
> > > |  |  |  |  |- parent java.util.TreeMap$Entry @
> > > 0x7f01053d34b0
> > > |           64 |   579,650,616
> > > |  |  |  |  |- right java.util.TreeMap$Entry @
> > > 0x7f01054cbbe8
> > > |           64 |   107,828,656
> > > |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > > 0x7f010f6866c0
> > > |           72 |       154,328
> > > |  |  |  |  |  |- <class> class org.apache.hadoop.hbase.HServerInfo @
> > > 0x7f013c61e3e0
> > > |            8 |             8
> > > |  |  |  |  |  |- load org.apache.hadoop.hbase.HServerLoad @
> > > 0x7f010540a548
> > > |           40 |       153,776
> > > |  |  |  |  |  |- serverName java.lang.String @ 0x7f010540a9a8
> > > cloud138,60020,1300161207678
> > > |           40 |           120
> > > |  |  |  |  |  |- hostname java.lang.String @ 0x7f010540ab60
> > > cloud138
> > > |           40 |            80
> > > |  |  |  |  |  |- serverAddress org.apache.hadoop.hbase.HServerAddress
> @
> > > 0x7f01054c3020                                                 |
> > > 32 |           280
> > > |  |  |  |  |  '- Total: 5
> > > entries
> > > |              |
> > > |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > > 0x7f010f77bd68
> > > |           88 |         3,200
> > > |  |  |  |  '- Total: 6
> > > entries
> > > |              |
> > > |  |  |  |- parent java.util.TreeMap$Entry @
> > > 0x7f010542b790
> > > |           64 |   950,435,408
> > > |  |  |  |- left java.util.TreeMap$Entry @
> > > 0x7f0105432b70
> > > |           64 |   307,135,480
> > > |  |  |  |  |- <class> class java.util.TreeMap$Entry @ 0x7f013bef1e08
> > > System Class
> > > |            0 |             0
> > > |  |  |  |  |- parent java.util.TreeMap$Entry @
> > > 0x7f01053d34b0
> > > |           64 |   579,650,616
> > > |  |  |  |  |- left java.util.TreeMap$Entry @
> > > 0x7f01054512f8
> > > |           64 |   139,023,720
> > > |  |  |  |  |- right java.util.TreeMap$Entry @
> > > 0x7f0105681960
> > > |           64 |   167,467,512
> > > |  |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > > 0x7f0112027ca8
> > > |           88 |         3,200
> > > |  |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > > 0x7f01123a1188
> > > |           72 |       184,040
> > > |  |  |  |  '- Total: 6
> > > entries
> > > |              |
> > > |  |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > > 0x7f010804cdc0
> > > |           88 |         3,200
> > > |  |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > > 0x7f01080e00b0
> > > |           72 |       220,672
> > > |  |  |  '- Total: 6
> > > entries
> > > |              |
> > > |  |  |- right java.util.TreeMap$Entry @
> > > 0x7f0105426ff0
> > > |           64 |   366,632,232
> > > |  |  |- value org.apache.hadoop.hbase.HServerInfo @
> > > 0x7f010a1689e8
> > > |           72 |       192,552
> > > |  |  |- key org.apache.hadoop.hbase.HRegionInfo @
> > > 0x7f010ae01598
> > > |           88 |         3,200
> > > |  |  '- Total: 5
> > > entries
> > > |              |
> > > |  '- Total: 2
> > > entries
> > > |              |
> > > |- executorService org.apache.hadoop.hbase.executor.ExecutorService @
> > > 0x7f010531ede0
> > > |           40 |         5,792
> > > '- Total: 12
> > > entries
> > > |              |
> > >
> > >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >
> > > We have over 7600 regions. It looks like AssignmentManager.regions
> keeps
> > a
> > > <HRegionInfo,HServerInfo>
> > > pair for each region, and more over, even we have only four region
> > servers
> > > in our environment, each
> > > <HRegionInfo,HServerInfo> pair has its own instance of HServerInfo,
> which
> > > is about hundrads of thousand
> > > bytes per instance. It looks like most of the memory of HServerInfo are
> > to
> > > contain RegionLoads for each
> > > region. Then the space requirement is cM x M, where M stands for the
> > number
> > > of region. I'm not clear
> > > if my analysis is correct, and if so, we should take the issue into
> > account
> > > while doing capacity schedule
> > > for the master, right?
> > >
> > > Thanks again for your patience.
> > >
> > > Mao Xu-Feng
> > >
> > >
> > > On Wed, Mar 16, 2011 at 1:41 AM, Jean-Daniel Cryans <
> jdcryans@apache.org
> > >wrote:
> > >
> > >> Inline.
> > >>
> > >> J-D
> > >>
> > >> On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <m9suns@gmail.com> wrote:
> > >> > Thanks J-D for your reply.
> > >> >
> > >> > It looks like HBASE-3617 will be included in 0.92, then when will
> 0.92
> > >> be
> > >> > released?
> > >>
> > >> It should be included in the bug fix release 0.90.2, which isn't
> > >> scheduled at the moment. Historically, HBase never had a tight
> > >> schedule and releases are made whenever a committer feels like there's
> > >> enough fixed jiras and gathers enough votes.
> > >>
> > >> >
> > >> > Yes, you're right, we launched tens of threads, putting values of
> 4MB
> > on
> > >> > average, endless.
> > >> > Does the region server meant to die because of OOM? I thought it's
> > >> region
> > >> > servers'
> > >> > responsibilty to flush memory stores into HFDS, the limitation while
> > >> doing
> > >> > insertion endlessly
> > >> > should be the size of HDFS, rather than java heap memory(we set 4GB
> > java
> > >> > heap for region
> > >> > server).
> > >>
> > >> Yes, the RS does control the MemStores. What it doesn't control very
> > >> well is all the queries that are in flight, plus the heap required to
> > >> do compactions, plus the data copied when flushing, plus all the other
> > >> small tidbits all over the place. Just as an example, every value that
> > >> you insert first has to be copied from the socket before it can be
> > >> inserted into the MemStore.  If you are using a big write buffer, that
> > >> means that every insert currently in flight in a region server takes
> > >> double that amount of space.
> > >>
> > >> Garbage collection also isn't done as soon as the objects aren't used,
> > >> that wouldn't make sense given how it works, so there's space occupied
> > >> by dead objects.
> > >>
> > >> The jira tracking the handling of OOMEs in HBase is
> > >> https://issues.apache.org/jira/browse/HBASE-2506
> > >>
> > >> >
> > >> > Today, we cleaned up the HDFS, rerun the stress tests, I mean
> > inserting
> > >> > endlessly.
> > >> > With java memory monitor tools, like jconsole, we find that the java
> > >> heap of
> > >> > master
> > >> > is also keeping increasing, another OOM is expected now, though not
> > >> happened
> > >> > so far.
> > >> > Is the master meant to die in this regarding?
> > >>
> > >> I think your monitoring is a bit naive, memory isn't cleaned as soon
> > >> as it's unused, that's not how the garbage collector works. Your OOME
> > >> in the master happens after a region server died because it's trying
> > >> to load too much data into memory.
> > >>
> > >> >
> > >> > Our keys are SHA1 hashed, which should spread uniformly. But from
> the
> > >> web
> > >> > page(master:60010),
> > >> > we can see most requests are handled only by one region server, and
> in
> > >> the
> > >> > master log,
> > >> > there are lots of region split, and eventually, the regions are
> > spreaded
> > >> > uniformly among the region
> > >> > servers, is this workflow correct?
> > >>
> > >> That's how it works. There's always one region in the beginning and
> > >> then it's split organically. You can create your tables pre-splitted
> > >> with this HBaseAdmin method:
> > >>
> > >>
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor
> > >> ,
> > >> byte[][])
> > >>
> > >> Or instead of trying to force your data into HBase, you could use the
> > >> bulk loader: http://hbase.apache.org/bulk-loads.html
> > >>
> > >> >
> > >> > Thanks again for your time, J-D.
> > >> >
> > >> > Mao Xu-Feng
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message