hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7507) Make memstore flush be able to retry after exception
Date Sat, 23 Feb 2013 01:30:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584930#comment-13584930
] 

Enis Soztutar commented on HBASE-7507:
--------------------------------------

bq. This is an important one for riding over ha nn topology changes (as per Chunhui). Was
seen on a cluster today.
As I reported in HBASE-7385, we've also seen this in NN HA tests.

bq. IMHO, this particular fix is only important if we have fixed all other write attempts
for HDFS.
We have seen some other edge case, where NN dies just before returning the RPC response for
create file, next retry from the DFS client fails due to file already exists exception. I
think I've logged it somewhere. Regardless, I think, fixing the memstore flush is important,
since it causes RS to abort on fail. 

Should we commit it, and if tests start failing, fix them later?
                
> Make memstore flush be able to retry after exception
> ----------------------------------------------------
>
>                 Key: HBASE-7507
>                 URL: https://issues.apache.org/jira/browse/HBASE-7507
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7507-94.patch, 7507-trunk v1.patch, 7507-trunk v2.patch, 7507-trunkv3.patch
>
>
> We will abort regionserver if memstore flush throws exception.
> I thinks we could do retry to make regionserver more stable because file system may be
not ok in a transient time. e.g. Switching namenode in the NamenodeHA environment
> {code}
> HRegion#internalFlushcache(){
> ...
> try {
> ...
> }catch(Throwable t){
> DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
>           Bytes.toStringBinary(getRegionName()));
> dse.initCause(t);
> throw dse;
> }
> ...
> }
> MemStoreFlusher#flushRegion(){
> ...
> region.flushcache();
> ...
>  try {
> }catch(DroppedSnapshotException ex){
> server.abort("Replay of HLog required. Forcing server shutdown", ex);
> }
> ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message