hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2295) [hbase] Still multi-assigning regions
Date Wed, 28 Nov 2007 22:24:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546440
] 

Jim Kellerman commented on HADOOP-2295:
---------------------------------------

When the master assigned a region twice to the same server, what happened was that on the
second assignment, the region server returned two 'region is opened' messages in the same
message to the master (code bug in HRegionServer). The first message was processed properly
and the master queued a task to update the meta. The master then looped to process the second
message, and now the region was no longer in the 'unassigned regions' list so the master told
the region server to close it. In the mean time, the update of the meta completes, clients
start trying to access the region but the region server has closed the region so the client
gets a NotServingRegion exception.

The master was doing the 'right thing' by reassigning the region because the region server
took longer than 'hbase.hbasemaster.maxregionopen' (30 seconds) to open the region.

Fixing the aforementioned bug will prevent this from occurring, but if the master should assign
the same region to two different region servers, this could easily result in corrupting the
region beyond repair as there is a lot of file i/o associated with bringing a region on-line.
To avoid this, we propose that a region server acknowledge that it is opening the region,
and when the master receives the ack, it extends the region reassignment timeout. If the region
server subsequently fails to open the region, the master can reassign the region after the
additional grace period and we can be pretty certain that there will be no interference between
multiple region servers opening the same region.

> [hbase] Still multi-assigning regions
> -------------------------------------
>
>                 Key: HADOOP-2295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2295
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>
> Start key is img216/1725/22236401lvh6.gif.  We doubly assign.  In between assignments,
master rules 'should not have opened region' and sends a shutdown region to remote server
-- which it obeys. Meantime, master thinks regions are deployed.
> Here is master log:
> {code}
> ...
> 2007-11-27 04:26:21,818 INFO  hbase.HMaster - region postlog,img216/1725/22236401lvh6.gif,1196164441230
split. New regions are: postlog,img216/1725/22236401lvh6.gif,1196166373235, postlog,img216/3961/aasaxonlivene6.jpg,1196166373241
> 2007-11-27 04:26:23,261 INFO  hbase.HMaster - assigning region postlog,img216/1725/22236401lvh6.gif,1196166373235
to server XX.XX.XX.33:60020
> 2007-11-27 04:26:27,462 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -400573094926830852
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: <img216/1725/22236401lvh6.gif>,
offline: true, split: true, tableDesc: {name: postlog, families: {cookie:={name: cookie, max
versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}}}}, server: XX.XX.XX.25:60020, startCode: 1196155489036
> 2007-11-27 04:26:27,464 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -400573094926830852
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: <img216/1725/22236401lvh6.gif>,
tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions:
1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server:
, startCode: -1
> 2007-11-27 04:26:27,464 DEBUG hbase.HMaster - Checking postlog,img216/1725/22236401lvh6.gif,1196166373235
is assigned
> 2007-11-27 04:26:27,671 DEBUG hbase.HMaster - Checked postlog,img216/1725/22236401lvh6.gif,1196164441230
for references: splitA: true, splitB: true
> 2007-11-27 04:26:53,328 INFO  hbase.HMaster - assigning region postlog,img216/1725/22236401lvh6.gif,1196166373235
to server XX.XX.XX.33:60020
> 2007-11-27 04:27:05,364 INFO  hbase.HMaster - XX.XX.XX.33:60020 serving postlog,img216/1725/22236401lvh6.gif,1196166373235
> 2007-11-27 04:27:05,364 DEBUG hbase.HMaster - region server XX.XX.XX.33:60020 should
not have opened region postlog,img216/1725/22236401lvh6.gif,1196166373235
> 2007-11-27 04:27:05,367 INFO  hbase.HMaster - regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235,
startKey: <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: {cookie:={name:
cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom
filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}} open on XX.XX.XX.33:60020
> 2007-11-27 04:27:05,368 INFO  hbase.HMaster - updating row postlog,img216/1725/22236401lvh6.gif,1196166373235
in table .META.,,1 with startcode 1196155493271 and server XX.XX.XX.33:60020
> 2007-11-27 04:27:26,380 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -4876467680083821318
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: <img216/1725/22236401lvh6.gif>,
offline: true, split: true, tableDesc: {name: postlog, families: {cookie:={name: cookie, max
versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}}}}, server: XX.XX.XX.25:60020, startCode: 1196155489036
> 2007-11-27 04:27:26,381 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -4876467680083821318
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: <img216/1725/22236401lvh6.gif>,
tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions:
1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server:
XX.XX.XX.33:60020, startCode: 1196155493271
> 2007-11-27 04:27:26,381 DEBUG hbase.HMaster - Checking postlog,img216/1725/22236401lvh6.gif,1196166373235
is assigned
> 2007-11-27 04:27:26,511 DEBUG hbase.HMaster - Checked postlog,img216/1725/22236401lvh6.gif,1196164441230
for references: splitA: true, splitB: true
> 2007-11-27 04:28:26,444 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -7699450087241029374
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: <img216/1725/22236401lvh6.gif>,
offline: true, split: true, tableDesc: {name: postlog, families: {cookie:={name: cookie, max
versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}}}}, server: XX.XX.XX.25:60020, startCode: 1196155489036
> 2007-11-27 04:28:26,446 DEBUG hbase.HMaster - HMaster.metaScanner scanner: -7699450087241029374
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: <img216/1725/22236401lvh6.gif>,
tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions:
1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server:
XX.XX.XX.33:60020, startCode: 1196155493271
> 2007-11-27 04:28:26,446 DEBUG hbase.HMaster - Checking postlog,img216/1725/22236401lvh6.gif,1196166373235
is assigned
> 2007-11-27 04:28:26,590 DEBUG hbase.HMaster - Checked postlog,img216/1725/22236401lvh6.gif,1196164441230
for references: splitA: true, splitB: true
> 2007-11-27 04:29:26,459 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 5846860608957952540
regioninfo: {regionname: postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: <img216/1401lvh6.gif>,
offline: true, split: true, tableDesc: {name: po: 1, compression: NONE, in memory: false,
max length: 21474836s: 1, compression: NONE, in memory: false, max length: 21474836er: none}}}},
server: XX.XX.XX.25:60020, startCode: 11961554890ase.HMaster - HMaster.metaScanner scanner:
5846860608957952540 /1725/22236401lvh6.gif,1196166373235, startKey: <img216/1725/22okie:={name:
cookie, max versions: 1, compression: NONE, in mem NONE, in memory: false, max length: 2147483647,
bloom filter: artCode: 1196155493271
> 2007-11-27 04:29:26,461 DEBUG hbase.HMas235 is assigned
> 2007-11-27 04:29:26,622 DEBUG hbase.HMaster - Cces: splitA: true, splitB: true
> ...
> {code}
> Here is regionserver log on node XX.XX.XX.33:
> {code}
> ...
> 2007-11-27 04:26:29,318 INFO  hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235,
startKey: <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: {cookie:={name:
cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom
filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}}
> 2007-11-27 04:26:29,325 DEBUG hbase.HStore - starting 51396220/cookie (no reconstruction
log)
> 2007-11-27 04:26:30,420 DEBUG hbase.HRegionServer - flushing region postlog,img100/941/brandyfn6.gif,1196158363481
> 2007-11-27 04:26:30,420 DEBUG hbase.HRegion - Not flushing cache for region postlog,img100/941/brandyfn6.gif,1196158363481:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:26:31,752 DEBUG hbase.HRegionServer - flushing region postlog,img149/2019/liyanaum6.gif,1196164996321
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegion - Not flushing cache for region postlog,img149/2019/liyanaum6.gif,1196164996321:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegionServer - flushing region postlog,img148/6048/p9290339kb0.jpg,1196162613173
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegion - Not flushing cache for region postlog,img148/6048/p9290339kb0.jpg,1196162613173:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:26:31,754 DEBUG hbase.HRegionServer - flushing region postlog,img132/5512/timeline8bx.png,1196159744555
> 2007-11-27 04:26:31,754 DEBUG hbase.HRegion - Not flushing cache for region postlog,img132/5512/timeline8bx.png,1196159744555:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:26:37,915 DEBUG hbase.HRegionServer - flushing region postlog,img149/3724/whosherne4.jpg,1196164996322
> 2007-11-27 04:26:37,915 DEBUG hbase.HRegion - Not flushing cache for region postlog,img149/3724/whosherne4.jpg,1196164996322:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:26:57,059 DEBUG hbase.HRegionServer - flushing region postlog,img148/8339/361de8578d356flip0hd.png,1196165445439
> 2007-11-27 04:26:57,060 DEBUG hbase.HRegion - Not flushing cache for region postlog,img148/8339/361de8578d356flip0hd.png,1196165445439:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:27:03,604 DEBUG hbase.HStore - maximum sequence id for hstore 51396220/cookie
is 30625781
> 2007-11-27 04:27:03,761 DEBUG hbase.HStore - starting 51396220/ip (no reconstruction
log)
> 2007-11-27 04:27:03,859 DEBUG hbase.HStore - maximum sequence id for hstore 51396220/ip
is 30625781
> 2007-11-27 04:27:03,965 DEBUG hbase.HRegion - Next sequence id for region postlog,img216/1725/22236401lvh6.gif,1196166373235
is 30625782
> 2007-11-27 04:27:03,969 INFO  hbase.HRegion - region postlog,img216/1725/22236401lvh6.gif,1196166373235
available
> 2007-11-27 04:27:03,969 INFO  hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235,
startKey: <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: {cookie:={name:
cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom
filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}}
> 2007-11-27 04:27:04,578 INFO  hbase.HRegionServer - MSG_REGION_CLOSE_WITHOUT_REPORT :
regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: <img216/1725/22236401lvh6.gif>,
tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions:
1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}
> 2007-11-27 04:27:04,578 DEBUG hbase.HRegion - Not flushing cache for region postlog,img216/1725/22236401lvh6.gif,1196166373235:
snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:27:04,579 DEBUG hbase.HStore - closed 51396220/ip
> 2007-11-27 04:27:04,580 DEBUG hbase.HStore - closed 51396220/cookie
> 2007-11-27 04:27:04,580 INFO  hbase.HRegion - closed postlog,img216/1725/22236401lvh6.gif,1196166373235
> ..
> {code}
> It looks like the assigned region hasn't been cleared from the unassigned regions list
when the report comes in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message