hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect
Date Thu, 18 Oct 2012 21:10:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479361#comment-13479361
] 

stack commented on HBASE-5974:
------------------------------

Like Lars and Todd above I almost said why RegionScannerHolder?  Is it because if you add
this class, you can do the seqid management in the regionserver and CPs do not have to be
concerned with its management?  That is fair enough and this is pretty elegant soln to the
issue.

Your worry is that a CP might return its own RegionScanner implementation.  Can you not add
to RegionScanner to methods, getNextSeq, and incrementNextSeq?  Then the regionserver would
have the means for keeping up the seqid in whatever the implementation rather than introduce
this wrapper class?  RegionScanner is marked as private.  For the trunk patch, we are requiring
a restart and allowing that CPs may need modification to make the leap from 0.94 to 0.96.
 Does that make it easier on you modifying RegionScanner to support nextSeq?

Yeah, I think that rather than:

{code}
+  optional uint64 callSeq = 6;
{code}

it should be called nextSeq or nextCallSeq (the latter might be better because nextSeq might
make you think the next sequence id rather than the sequence id for the 'next' call).  All
references should be changed if you change this I"d say Anoop.

This test, TestClientScannerRPCTimeout, looks good.  Does it have to be in a separate class?
 Do we not have a scanner test already that has a running minicluster?  I can understand though
if you want to keep this separate because its hard to integrate into a current test suite.
 If you are going to have a class for this single test and are going to go to the bother of
spinning up a whole minicluster, I'd say do a bit more testing.  Are there other scenarios
you can manufacture?  Can you add assert that you actually slept?

Good stuff Anoop.  Lets get this important patch in.




                
> Scanner retry behavior with RPC timeout on next() seems incorrect
> -----------------------------------------------------------------
>
>                 Key: HBASE-5974
>                 URL: https://issues.apache.org/jira/browse/HBASE-5974
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver
>    Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
>            Reporter: Todd Lipcon
>            Assignee: Anoop Sam John
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 5974_94-V4.patch, 5974_trunk.patch, 5974_trunk-V2.patch, HBASE-5974_0.94.patch,
HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch
>
>
> I'm seeing the following behavior:
> - set RPC timeout to a short value
> - call next() for some batch of rows, big enough so the client times out before the result
is returned
> - the HConnectionManager stuff will retry the next() call to the same server. At this
point, one of two things can happen: 1) the previous next() call will still be processing,
in which case you get a LeaseException, because it was removed from the map during the processing,
or 2) the next() call will succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message