Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 11 Sep 2012 09:02:08 +1100 (NCT)
From: "stack (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <2006919541.60388.1347314528290.JavaMail.jiratomcat@arcas>
In-Reply-To: <1161232743.58026.1347289087629.JavaMail.jiratomcat@arcas>
Subject: [jira] [Commented] (HBASE-6752) On region server failure, serve
 writes and timeranged reads during the log split
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452496#comment-13452496 ] 

stack commented on HBASE-6752:
------------------------------

Makes sense.  Sounds great.  How we know what regionserver to give a log split too when the log has edits for all regions that were on a regionserver.  You thinking we could give all regions on the crashed regionserver to a particular regionserver?
                
> On region server failure, serve writes and timeranged reads during the log split
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-6752
>                 URL: https://issues.apache.org/jira/browse/HBASE-6752
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Priority: Minor
>
> Opening for write on failure would mean:
> - Assign the region to a new regionserver. It marks the region as recovering
>   -- specific exception returned to the client when we cannot server.
>   -- allow them to know where they stand. The exception can include some time information (failure stated on: ...)
>   -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address
>      => save network calls, lower the load on meta.
> - Do the split as today. Priority is given to region server holding the new regions
>   -- help to share the load balancing code: the split is done by region server considered as available for new regions
>   -- help locality (the recovered edits are available on the region server) => lower the network usage
> - When the split is finished, we're done as of today
> - while the split is progressing, the region server can
>  -- serve writes
>    --- that's useful for all application that need to write but not read immediately:
>    --- whatever logs events to analyze them later
>    --- opentsdb is a perfect example.   
>  -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because:
>    --- we can expect to have a few minutes of data only (as it's loaded)
>    --- the heaviest queries, often accepts a few -or more- minutes delay. 
> Some "What if":
> 1) the split fails
> => Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again.
> 2) the regionserver fails during the split
> => As 1 and as of today/
> 3) the regionserver fails after the split but before the state change to fully available.
> => New assign. More logs to split (the ones already dones and the new ones).
> 4) the assignment fails
> => Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira