hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10595) HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed
Date Sun, 02 Mar 2014 09:50:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917343#comment-13917343
] 

Feng Honghua commented on HBASE-10595:
--------------------------------------

Thanks [~v.himanshu] and [~enis] for review and comments!
bq.This is a call to NN, and this patch would invoke it for every get request.
Actually before this patch every get request still need a call to NN, The call to NN is to
determine if the request can be satisfied by the cache (getTableInfoModtime will ask NN for
the last modified time of the table descriptor file...), so you can view the check of existence
of table dir is a kind of shortcut.
{code}
    if (cachedtdm != null) {
      // Check mod time has not changed (this is trip to NN).
      if (getTableInfoModtime(tablename) <= cachedtdm.getModtime()) {
        cachehits++;
        return cachedtdm.getTableDescriptor();
      }
    }
{code}

bq.If a user deletes a table dir, wouldn't there be other (and more severe) consistencies
such as meta being hosed, etc. What is the use case where a user is deleting the table dir
behind the curtain ?
User deleting a table dir should not be an expected / normal scenario, should be deemed as
a special case to result in the inconsistency of this patch. HBCK should be used to remove
the items of this table in meta table for overall consistency. This is an issue out of scope
of this patch.
This patch in more sensce is to fix the bug that listTables() and getTableDescriptor() use
different criterion when table dir isn't existent:
# listTables() : deems a table is nonexistent and doesn't return according item if the table
dir isn't existent, without checking if there is a corresponding item in table descriptor
cache
# getTableDescriptor() : check the last modified time of the table descriptor file under table
dir, if the file isn't existent or it's last modified time isn't newer than the one of table
descriptor cache, deems as a cache hit and just return the one from cache

bq.I meant NameNode, not HMaster.
Thanks you two clarifying on this, I now realize you meant the access to NN to checking existence
of the table dir :-)

bq.That is an inconsistency. But completely making the cache useless is not the way to solve
it I think.
Glad you agree it's an inconsistency:-)
Why I think completely making the cache useless is acceptable:
# In essence all 'cache' item should be coherent to it backing store (think about all other
kinds of cache:-)), the difference among various cache strategies(such as no-write / write-through
/ write-back...)  are when and how to keep cache to be coherent with its backing store, but
not whether it should be coherent to its backing store, and whenever it isn't coherent with
backing store, it's deemed as 'invalid' and should not be used to serve request. In this scenario,
the table descriptor file is the backing store of the table descriptor cache.
# listTables() currently already use the semantic of above 'whenever cache isn't coherent
with backing store, it's deemed as 'invalid' and should not be used to serve request' : it
doesn't use cache if the table dir doesn't exist.
# Whether an active master failover-switch happens is transparent to the client, what client
cares about is its requests are served consistently. But if we still use cache for getTableDescriptor()
even its backing store(table dir and table descriptor file) changes(being removed is a special
kind of change, right?), inconsistency can happen for two consecutive getTableDescriptor()
calls : the previous active master uses its cache for the first getTableDescriptor, then it
fails, another master takes the active master role, it finds no table dir(hence no table descriptor
file), so it returns null for the second getTableDescriptor...

bq. That is the job of HBCK. There is no guarantees expected from the master if the user deletes
dirs under it.
Partially agree:-). But it sounds much better if we could keep consistency from client perspective
even under such corruption as table dir is removed from outside, right?

Lastly, what about we don't check the existence of the table dir, and just return null if
the table descriptor file doesn't exist(it's now treated a special case of "modified time
not newer than cache", IMHO it's incorrect)? This way we still align with the cache semantic
"invalidate cache if it's not coherent with its backing store" but can save an access to NN
if the table dir does exist.

> HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even
after the table dir in hdfs is removed
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10595
>                 URL: https://issues.apache.org/jira/browse/HBASE-10595
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master, util
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, HBASE-10595-trunk_v3.patch,
HBASE-10595-trunk_v4.patch
>
>
> When a table dir (in hdfs) is removed(by outside), HMaster will still return the cached
TableDescriptor to client for getTableDescriptor request.
> On the contrary, HBaseAdmin.listTables() is handled correctly in current implementation,
for a table whose table dir in hdfs is removed by outside, getTableDescriptor can still retrieve
back a valid (old) table descriptor, while listTables says it doesn't exist, this is inconsistent
> The reason for this bug is because HMaster (via FSTableDescriptors) doesn't check if
the table dir exists for getTableDescriptor() request, (while it lists all existing table
dirs(not firstly respects cache) and returns accordingly for listTables() request)
> When a table is deleted via deleteTable, the cache will be cleared after the table dir
and tableInfo file is removed, listTables/getTableDescriptor inconsistency should be transient(though
still exists, when table dir is removed while cache is not cleared) and harder to expose



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message