hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13709) Updates to meta table server columns may be eclipsed
Date Tue, 19 May 2015 01:30:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549622#comment-14549622
] 

Enis Soztutar commented on HBASE-13709:
---------------------------------------

Here are the details from a test failure: 
We create a table with 8 replicas, and end of test, we verify that all the replicas are assigned.
For that we do a meta scan, which does not see replica id 5 (notice that the server value
is null): 
{code}
2015-05-15 09:03:06,982|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server, timestamp=1431680376535, value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 09:03:06,984|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0001, timestamp=1431680377792, value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 09:03:06,985|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0002, timestamp=1431680376517, value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 09:03:06,987|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0003, timestamp=1431680377792, value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 09:03:06,989|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0004, timestamp=1431680377259, value=ip-172-31-41-6.ec2.internal:16020
2015-05-15 09:03:06,990|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0005, timestamp=1431680375996, value=
2015-05-15 09:03:06,992|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0006, timestamp=1431680376986, value=ip-172-31-41-11.ec2.internal:16020
2015-05-15 09:03:06,994|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0007, timestamp=1431680377359, value=ip-172-31-41-7.ec2.internal:16020
{code}

Looking at the master and region server logs, it seems that the region replica with name {{tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c}}
is indeed assigned, and it actually did write to meta: 
{code}
2015-05-15 08:59:35,473 INFO  [PriorityRpcServer.handler=10,queue=0,port=16020] regionserver.RSRpcServices:
Open tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,494 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] hfile.CacheConfig:
blockCache=LruBlockCache{blockCount=2, currentSize=459272, freeSize=420648856, maxSize=421108128,
heapSi
2015-05-15 08:59:35,495 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] compactions.CompactionConfiguration:
size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,499 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] hfile.CacheConfig:
blockCache=LruBlockCache{blockCount=2, currentSize=459272, freeSize=420648856, maxSize=421108128,
heapSi
2015-05-15 08:59:35,500 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] compactions.CompactionConfiguration:
size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] hfile.CacheConfig:
blockCache=LruBlockCache{blockCount=2, currentSize=459272, freeSize=420648856, maxSize=421108128,
heapSi
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] compactions.CompactionConfiguration:
size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,507 INFO  [RS_OPEN_REGION-ip-172-31-41-8:16020-2] regionserver.HRegion:
Onlined 68af88e9274633ba67b926b142b7ff4c; next sequenceid=2
2015-05-15 08:59:35,513 INFO  [PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] regionserver.HRegionServer:
Post open deploy tasks for tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,522 INFO  [PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor:
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with server=ip-17
{code}

As you can see from the meta scan results, the cell: 
{code}
column=info:server_0005, timestamp=1431680375996, value=
{code}
has timestamp of 1431680375996, which is {{Fri, 15 May 2015 08:59:35 GMT}} and the ms is 996,
however the meta update from the RS uses it's local timestamp because of HBASE-11536. As you
can see from the log:
{code}
2015-05-15 08:59:35,522 INFO  [PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor:
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with server=ip-17
{code}
has a timestamp of {{2015-05-15 08:59:35,522}} which is less than the master's TS when it
created the meta row with empty location. 


> Updates to meta table server columns may be eclipsed
> ----------------------------------------------------
>
>                 Key: HBASE-13709
>                 URL: https://issues.apache.org/jira/browse/HBASE-13709
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>
>
> HBASE-11536 fixes a case where on a very rare occasion, the meta updates may be processed
out of order. The fix is to use the RS's timestamp for the server column in meta update, but
that actually opens up a vulnerability for clock skew (see the discussions in the jira). 
> For the region replicas case, we can reproduce a problem where the server name field
is eclipsed by the masters earlier update because the RS is lagging behind. However, this
is not specific to replicas, but occurs more frequently with it. 
> One option that was discussed was to send the master's ts with open region RPC and use
it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message