hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-549) Don't CLOSE region if message is not from server that opened it or is opening it
Date Sat, 05 Apr 2008 19:51:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586042#action_12586042
] 

stack commented on HBASE-549:
-----------------------------

Messages currently have Type and 'subject' where subject is usually a HRI; splits reported
to master IIRC carry the daughter regions.

To fix this issue and others like it that may occur,missing is an Message 'source'.  Would
also be sweet if Messages could carry optional payload.  I'm thinking of when a HRS sends
a CLOSE, it could bundle the Exception that prompted the closing.  This way, could read the
master log and get a sense of the whole cluster.

So, a suggestion without having dug in to check that this suggestion is overkill would be
to change HMsg to be something like the below interface in pseudo-code:

{code}
interface Message {
    ServerAddress getSource();
    // Are all of our Messages always about a Region?
    HRI getSubject();
    byte getType();
    Text getOptionalPayload();
}
{code}

The split message could subclass the above.

Do we need an ID too?  Should IDs be monotonically increasing so message processing can be
done in order?

> Don't CLOSE region if message is not from server that opened it or is opening it
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-549
>                 URL: https://issues.apache.org/jira/browse/HBASE-549
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.16.0, 0.2.0, 0.1.1, 0.1.0
>            Reporter: stack
>             Fix For: 0.2.0
>
>
> We assign a region to a server.  It takes too long to open (HBASE-505).  Region gets
assigned to another server.  Meantime original host returns a MSG_REPORT_CLOSE (because other
regions opening messes it up moving files on disk out from under it).  We queue a shutdown
which marks the region as needing reassignment.  Second server reports in that it successfully
opened the region.  Master tells it it should not have opened it.  Churn ensues.
> Fix is to ignore the CLOSE if its reported server/startcode does not match that of the
server currently trying to open region.  Fix is not easy because currently we don't keep list
of server info in unassigned regions.
> Here's master log snippet showing problem:
> {code}
> ...
> 2008-03-25 19:16:43,711 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
to server XX.XX.XX.220:60020
> 2008-03-25 19:16:46,725 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.220:60020
> 2008-03-25 19:18:06,411 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking
at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:18:06,811 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking
at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:46,841 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
to server XX.XX.XX.221:60020
> 2008-03-25 19:19:49,849 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.221:60020
> 2008-03-25 19:19:56,883 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE
: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.220:60020
> 2008-03-25 19:19:56,883 INFO org.apache.hadoop.hbase.HMaster: XX.XX.XX.220:60020 no longer
serving regionname: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: <iLStZ0yTnfVUziYcNVVxWV==>,
endKey: <jLB27Q4hKls4tSvp64rJfF==
> >, encodedName: 1857033608, tableDesc: {name: enwiki_080103, families: {alternate_title:={name:
alternate_title, max versions: 3, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}, alternate_url:={name: al
> ternate_url, max versions: 3, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}, anchor:={name: anchor, max versions: 3, compression: NONE, in memory:
false, max length: 2147483647, bloom filter: none}, mi
> sc:={name: misc, max versions: 3, compression: NONE, in memory: false, max length: 2147483647,
bloom filter: none}, page:={name: page, max versions: 3, compression: NONE, in memory: false,
max length: 2147483647, bloom filter: none}, re
> direct:={name: redirect, max versions: 3, compression: NONE, in memory: false, max length:
2147483647, bloom filter: none}}}
> 2008-03-25 19:19:56,885 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, true, false
> 2008-03-25 19:19:56,885 INFO org.apache.hadoop.hbase.HMaster: region closed: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:56,887 INFO org.apache.hadoop.hbase.HMaster: reassign region: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:57,288 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
to server XX.XX.XX.189:60020
> 2008-03-25 19:20:00,296 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.189:60020
> 2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_OPEN
: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.221:60020
> 2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: region server XX.XX.XX.221:60020
should not have opened region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:51,707 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking
at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:51,834 DEBUG org.apache.hadoop.hbase.HMaster: shutdown scanner looking
at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:53,947 INFO org.apache.hadoop.hbase.HMaster: assigning region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
to server XX.XX.XX.97:60020
> ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message