hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
Date Mon, 14 Jan 2008 06:25:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558515#action_12558515
] 

Billy Pearson commented on HADOOP-2587:
---------------------------------------

The latest patch works good. 

>From what I am seeing now is once a split is needed it waits until the next memcache flush
is done and checks if compaction is running on that region if so then it waits for the next
memcache flush to see if the compaction is done and so on and so on.

The only problem I see with this is

On a one region taking many min's to finsh compact and under heavy updates the next compaction
will start right after the first one is done unless you are completely luck to finsh a compaction
and a memcache flush at the same time before the next compaction starts then it will just
keep delayed the split. Which is fine with me at this point the only time something like that
would happen is if there is only one region on a region server getting updated heavily and
its needing to compact over and over again with out the updates slowing.

So right  now the only improvement I can thank of is if the spliter can block a new compaction
from starting while waiting for the next memcache flush to happen to get that lock. But if
you can not do that then is no problem right now the split will work as is and will split
once a heave update load on a region has slowed and the compaction has finshed. 

Again this bug only happens when the region server has no other region to compact and the
region it has is under heave update load.

> Splits getting blocked by compactions causeing region to be offline for the length of
the compaction 10-15 mins
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2587
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2587
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: hadoop subversion 611087
>            Reporter: Billy Pearson
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt
>
>
> The below is cut out of one of my region servers logs full log attached
> What is happening is there is one region on a this region server and its is under heave
insert load so compaction are back to back one one finishes a new one starts the problem starts
when its time to split the region. 
> A compaction starts just millsecs before the split starts blocking the split but the
split closes the region before the compaction is finished. Causing the region to be offline
until the compaction is done. Once the compaction is done the split finishes and all is returned
to normal but this is a big problem for production if the region is offline for 10-15 mins.
> The solution would be not to let the split thread to issue the below line while a compaction
on that region is happening.
> 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488
closing (Adding to retiringRegions)
> The only time I have seen this bug is when there is only one region on a region server
because if more then one then the compaction happens to the other region(s) after the first
one is done compaction and the split can do what it needs on the first region with out getting
blocked.
> {code}
> 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region webdata,,1200085987488. Took 16mins, 10sec
> 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size
needed.
> 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs
compaction
> 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on
region webdata,,1200085987488
> 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14
files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
for webdata,,1200085987488/size
> 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush
for region webdata,,1200085987488. Size 31.2m
> 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488
because largest aggregate size is 100.7m and desired size is 64.0m
> 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488
closing (Adding to retiringRegions)
> ...
> lots of NotServingRegionException's
> ...
> 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region webdata,,1200085987488. Took 10mins, 58sec
> ...
> 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits
true
> 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488
complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239.
Split took 11mins, 0sec
> 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers:
No servers for .META.. Doing a find...
> 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers:
Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0,
startKey: <>, encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name:
info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom
filter: none}}}
> 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with
region split info
> 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region
split to master
> 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META
update, and report to master all successful. Old region=webdata,,1200085987488, new regions:
webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message