hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync
Date Sun, 05 Sep 2010 00:19:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906312#action_12906312
] 

dhruba borthakur commented on HBASE-2957:
-----------------------------------------

I am seeing a typical use case of hbase where all the rows of
a table are not equally hot. A few rows are orders of magnitude
hotter than most other rows.

Each get/put operation in hbase involes the following:
{code}
 put operation                           get operation
 --------------------------------------------------------------
1. acquire the rowlock
2. append to hlog
3. update memstore                 read from memstore
4. release rowlock
{code}

For example, if the appliction workload consists of only increment operations on *one* record,
then
the entire workload is serialized and the throughout is purely dependent on the
speed of the append-hlog operation. The number of hlog.append calls is
precisely the same as the number of put calls. This can be slow, especially
because the append operation requires writing to three datanodes in hdfs.

We can make the workload supertfast while keeping the same data consistency
guarantees if we can achieve some batching. For
each record, let's say that the memstore contains a version of the record that has been committed
to
hlog and another version of the same record that is being updated in memory
but has not yet been committed to hlog. let's say that we refer to these two versions
of the record as "memstore.inflight" and "memstore.committed" versions.

{code}
 put operation                                        get operation
 ----------------------------------------------------------------------------------
1. acquire the rowlock
2. update memstore.inflight                   read memstores.committed
3. release rowlock
3. append to hlog
4. memstore.committed = memstore.inflight

{code}

The key to the above protocol is that the rowlock is released as soon
as memstore is updated. This means that multiple calls to put() for
the same record will be parallelized and would result in a fewer calls
to hlog.append.

Do people think that this is feasible and beneficial? If so, I can delve deeper into the design
and implementation of this performance improvement.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed
by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row
lock is held while the row is updated in memory (b) the row lock is released after queuing
the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for
any given row are synced in order (d) the HBase client only receives a success notification
after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler
thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records,
the "Don't hold row lock while waiting for sync" option should be available to HBase clients
on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message