hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2283) row level atomicity
Date Thu, 04 Mar 2010 00:00:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840957#action_12840957

Kannan Muthukkaruppan commented on HBASE-2283:

Followup discussions on the hbase-dev:

JD wrote: <<< Indeed. The syncWal was taken back up in HRS as a way to optimize batch
Puts but the fact it's called after all the MemStore operations is indeed a problem. I think
we need to fix both (#1) and (#2) by ensuring we do only a single append for whatever we have
to put and then syncWAL once before processing the MemStore. But, the other problem here is
that the row locks have to be taken out on all rows before everything else in the case of
a Put[] else we aren't atomic. And then I think some checks are ran under HRegion that we
would need to run before everything else.>>>

Ryan wrote: <<< Do we really need a single actual DFS atomic write operation?  If
we had some kind of end-of-row marker, would that help instead?>>>

Yes, a marker or length-prefixed approach would suffice to recognize and ignore incomplete
transactions during recovery.

Ryan wrote: <<< But as you said, what happens if hlog append fails?  The obvious
thing would be to remove the additions from the memstore.  But how to accomplish this easily?>>>

Wouldn't moving all the memstore updates to happen after the sync suffice?

> row level atomicity 
> --------------------
>                 Key: HBASE-2283
>                 URL: https://issues.apache.org/jira/browse/HBASE-2283
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Priority: Blocker
> The flow during a HRegionServer.put() seems to be the following. [For now, let's just
consider single row Put containing edits to multiple column families/columns.]
> HRegionServer.put() does a:
>         HRegion.put();
>        syncWal()  (the HDFS sync call).  /* this is assuming we have HDFS-200 */
> HRegion.put() does a:
>   for each column family 
>   {
>       HLog.append(all edits to the colum family);
>       write all edits to Memstore;
>   }
> HLog.append() does a :
>   foreach edit in a single column family {
>     doWrite()
>   }
> doWrite() does a:
>    this.writer.append().
> There seems to be two related issues here that could result in inconsistencies.
> Issue #1: A put() does a bunch of HLog.append() calls. These in turn do a bunch of "write"
calls on the underlying DFS stream.  If we crash after having written out some append's to
DFS, recovery will run and apply a partial transaction to memstore.  
> Issue #2: The updates to memstore  should happen after the sync rather than before. Otherwise,
there is the danger that the write to DFS (sync) fails for some reason & we return an
error to the client, but we have already taken edits to the memstore. So subsequent reads
will serve uncommitted data.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message