hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18061) [C++] Fix retry logic in multi-get calls
Date Fri, 14 Jul 2017 19:39:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087897#comment-16087897

Enis Soztutar commented on HBASE-18061:

Thanks Sudeep for the patch. 

I've run the unit tests which pass. I was also be to run simple-client doing multi-gets against
500K rows with batch sizes 1,10,100,1000,10000 and 100000.  

A batch size of 500K results in  
 what():  LifoSemMPMCQueue full, can't add item
but it is fine for now. 

Why are we doing try lock here? What happens when we cannot acquire the lock? 
-        std::lock_guard<std::mutex> lock(multi_mutex_);
-        for (uint64_t num = 0; num < completed_responses.size(); ++num) {
+        std::unique_lock<std::mutex> lock(multi_mutex_, std::try_to_lock);
We are doing this in multiple places actually, and we never check the results of whether the
lock was acquired or not. Seems something to fix before commit.  

remove these from the patch: 
+//#include <gmock/gmock.h>
+//using ::testing::Return;
+//using ::testing::_;

I was double checking the logic again against the java implementation. It seems that we are
missing the function: 
  private List<ThrowableWithExtraContext> removeErrors(Action action) {
    synchronized (action2Errors) {
      return action2Errors.remove(action);
which is called from failOne() etc, and removes the error from the actions2Errors map. However,
logically, I think it is fine to not have this removal logic. Just making a note here in case
we run into it later. 
- Mainly a code cleanness issue, but you should move this logic: 
auto itr = search->second->actions_by_region().find(region_loc->region_name());
              if (itr != search->second->actions_by_region().end()) {
                search->second->AddActionsByRegion(region_loc, action);
              } else {
                search->second = std::make_shared<ServerRequest>(region_loc);
                search->second->AddActionsByRegion(region_loc, action);
to be inside ServerRequest::AddActionsByRegion(). The logic for "add the action to the list
of actions by region by finding the region" belongs in this method, and should not be exposed
outside normally. 

I think there is no multi-region test with failure conditions. We can look at adding that
in a follow up issue if needed. 

In {{::Send()}}, this logic: 
if (!ExceptionUtil::ShouldRetry(ew)) {
              std::vector<std::shared_ptr<Action>> failed_actions;
              for (auto &value : action_by_server.second->actions_by_region()) {
                for (const auto &failed_action : value.second->actions()) {
              FailAll(failed_actions, tries);
            } else {
              OnError(action_by_server.second->actions_by_region(), tries, ew,
can be replaced by a simple OnError() call, no? OnError() already checks whether exception
should retry and calls FailAll() with the same vector of actions. Can you please double check.

Other than these patch looks pretty good. 

> [C++] Fix retry logic in multi-get calls
> ----------------------------------------
>                 Key: HBASE-18061
>                 URL: https://issues.apache.org/jira/browse/HBASE-18061
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Sudeep Sunthankar
>             Fix For: HBASE-14850
>         Attachments: HBASE-18061.HBASE-14850.v1.patch, HBASE-18061.HBASE-14850.v3.patch,
HBASE-18061.HBASE-14850.v5.patch, HBASE-18061.HBASE-14850.v6.patch, HBASE-18061.HBASE-14850.v7.patch,
> HBASE-17576 adds multi-gets. There are a couple of todos to fix in the retry logic, and
some unit testing to be done for the multi-gets. We'll do these in this issue. 

This message was sent by Atlassian JIRA

View raw message