hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2599) BaseScanner says "Current assignment of X is not valid" over and over for same region
Date Thu, 27 May 2010 23:21:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-2599:
-------------------------

    Attachment: 2599-trunk.txt

Version for trunk that has todd suggested changes.  Will apply soon.

> BaseScanner says "Current assignment of X is not valid" over and over for same region
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2599
>                 URL: https://issues.apache.org/jira/browse/HBASE-2599
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 2599-0.20.txt, 2599-trunk.txt
>
>
> From IRC today
> {code}
> 12:41 < cmorgan> hey guys. I'm having a recent  issue with a single node cluster
running 0.20.4. After stopping for a backup I now get region assignment churn. Seems master
keeps thinking that region
>                  assignment is not valid even when it is. Following is a log snippet:
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] DEBUG ter.RegionServerOperationQueue
 - Processing todo: PendingOpenOperation from localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  e.master.RegionServerOperation
 - net_troove_coin_account_AccountCredentials,,1234913258116 open on 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  e.master.RegionServerOperation
 - Updated row net_troove_coin_account_AccountCredentials,,1234913258116 in region .META.,,1
with
>                  startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] DEBUG ter.RegionServerOperationQueue
 - Processing todo: PendingOpenOperation from localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  e.master.RegionServerOperation
 - net_troove_application_request_TemporaryRequest,,1234913268355 open on 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [        HMaster] INFO  e.master.RegionServerOperation
 - Updated row net_troove_application_request_TemporaryRequest,,1234913268355 in region .META.,,1
with
>                  startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ger.metaScanner] DEBUG adoop.hbase.master.BaseScanner
 - Current assignment of net_troove_coin_account_AccountEntry,,1271448856984 is not valid;
>                  serverAddress=127.0.0.1:7802, startCode=1274425405680 unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443248 [ger.metaScanner] DEBUG adoop.hbase.master.BaseScanner
 - Current assignment of net_troove_coin_account_AccountEntry-Base_EntryDay_DESCENDING,,1273266418876
>                  is not valid;  serverAddress=127.0.0.1:7802, startCode=1274425405680
unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443251 [ger.metaScanner] DEBUG adoop.hbase.master.BaseScanner
 - Current assignment of net_troove_coin_bank_BankStatement,,1266433980935 is not valid;
>                  serverAddress=127.0.0.1:7802, startCode=1274425405680 unknown.
> 12:58 < cmorgan> stack: I'd been running with 0.20.4 for a week or so starting/stopping
every night. Now this happens...
> 14:11 < cmorgan> stack: some more info: On our mini production server the regionserver
is getting "My address is localhost.:7802" (notice the dot after localhost). But the master
is also sometimes
>                  referring to it as 127.0.0.1. I just used the same data and config on
my laptop, and its binding to my external LAN ip ("My address is 10.0.1.4:7802"). Under this
setup hbase comes up
>                  stable (no region assignment churn).
> {code}
> Looking at this, I think issue is that when we register a server we use a getServerName
on a HServerInfo provided by the regionserver (though we are on the master side) but BaseScanner
uses a getServerName that is made by doing a dns lookup using the IP that it finds in the
server column of .META.  My sense is that is possible for the regionserver hostname and what
the master finds when it does a lookup against dns can disagree, fatally.
> This issue seems popular over last few weeks.  Was reported at least once more on a standalone
instance and also on krispykola's 15-node ec2 cluster (He went back to 0.20.3 and then it
went away?).  It made for what looked like double-assignment in his case (Our attempt at caching
DNS names may be amiss -- I tihnk tht the main diff between 0.20.3 and 0.20.4 in this area).
> My thought is to purge DNS from the HServerInfo passed by the RS to Master on startup
and heartbeating and to use IPs only (and even then, the IP that the master tells the RS to
use, its remote address as seen by the master).  We might have to do this fix for 0.20.5 since
it seems to happen more in 0.20.4.
> I'm looking into this.  Opinions welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message