hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6284) Introduce HRegion#doMiniBatchDelete()
Date Mon, 02 Jul 2012 05:11:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404892#comment-13404892
] 

Anoop Sam John commented on HBASE-6284:
---------------------------------------

bq.Consider maintaining two variables: one for puts, one for deletions.

You mean there should be 2 time variables and within the loop in different places as per the
type of Mutation we need to update the variables accordingly?
Like below
{code}
if (isPutMutation) {
            // Check the families in the put. If bad, skip this one.
            checkFamilies(familyMap.keySet());
            checkTimestamps(mutation.getFamilyMap(), now);
            // update the put net time
          } else {
            prepareDelete((Delete) mutation);
            // update delete net time
          }
{code}
Similarly
{code}
if (mutation instanceof Put) {
          updateKVTimestamps(familyMaps[i].values(), byteNow);
          noOfPuts++;
        } else {
          prepareDeleteTimestamps(familyMaps[i], byteNow);
          noOfDeletes++;
        }
{code}
Down the line in this method we need to differentiate the type of mutation where we apply
KVs to memstore and write to WAL. Also the WAL sync is one operation. This sync may be for
some Puts and some Deletes.  How to get the exact numbers here? The code will become more
complex I felt. U have any suggestions?
I think I got you comment correctly. Correct me if I am wrong pls
                
> Introduce HRegion#doMiniBatchDelete()
> -------------------------------------
>
>                 Key: HBASE-6284
>                 URL: https://issues.apache.org/jira/browse/HBASE-6284
>             Project: HBase
>          Issue Type: Bug
>          Components: performance, regionserver
>            Reporter: Zhihong Ted Yu
>            Assignee: Anoop Sam John
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk.patch
>
>
> From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion':
> The HTable#delete(List<Delete>) groups the Deletes for the same RS and make one
n/w call only. But within the RS, there will be N number of delete calls on the region one
by one. This will include N number of HLog write and sync. If this also can be grouped can
we get better performance for the multi row delete.
> I have made the new miniBatchDelete () and made the HTable#delete(List<Delete>)
to call this new batch delete.
> Just tested initially with the one node cluster.  In that itself I am getting a performance
boost which is very much promising.
> Only one CF and qualifier.
> 10K total rows delete with a batch of 100 deletes. Only deletes happening on the table
from one thread.
> With the new way the net time taken is reduced by more than 1/10
> Will test in a 4 node cluster also. I think it will worth doing this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message