hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
Date Mon, 13 Jan 2014 16:02:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869634#comment-13869634
] 

Hadoop QA commented on HBASE-10329:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12622617/HBASE-10329-trunk_v0.patch
  against trunk revision .
  ATTACHMENT ID: 12622617

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 1.0 profile.

    {color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 1.1 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

    {color:red}-1 site{color}.  The patch appears to cause mvn site goal to fail.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//console

This message is automatically generated.

> Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer
encounters null writer and its writes aren't synced by other Asyncer
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10329
>                 URL: https://issues.apache.org/jira/browse/HBASE-10329
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: HBASE-10329-trunk_v0.patch
>
>
> Last month after I introduced multiple AsyncSyncer threads to improve the throughput
for lower number client write threads, [~stack] encountered a NPE while doing the test where
null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster
to verify the throughput improvement, and never encountered such NPE, it really confused me.
(and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation)
> These days from time to time I wondered why the writer can be null in AsyncSyncer and
whether it's safe to fix it by just adding a null checking before doing sync, as [~stack]
did. After some digging, I find out the case where AsyncSyncer can encounter null-writer,
it is as below:
> 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100
> 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200
> 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client
writes to enter pendingWrites, and then waits for all items(<= 200) in pendingWrites to
append and finally sync to hdfs
> 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync <=100 as
a whole)
> 5. t5: rollWriter now can close writer, set writer=null...
> 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter
sets writer to the newly rolled Writer
> We can see:
> 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's
why we never encountered it before introducing multiple AsyncSyncer threads.
> 2. since rollWriter can set writer=null only after all items of pendingWrites sync to
hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter
thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer
in AsyncWriter though it also uses writer. This is the same reason as why null-writer never
occurs when there is a single AsyncSyncer thread.
> And we should treat differently when writer == null in AsyncSyncer:
> 1. if txidToSync <= syncedTillHere, this means all writes this AsyncSyncer care about
have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does
here);
> 2. if txidToSync > syncedTillHere, we need fail all the writes with txid <= txidToSync
to avoid data loss: user gets successful write response but can't read out the writes after
getting the successful write response, from user's perspective this is data loss (according
to above analysis, such case should not occur, but we still should add such defensive treatment
to prevent data loss if it really occurs, such as by some bug introduced later)
> also fix the bug where isSyncing needs to reset to false when writer.sync encounters
IOException: AsyncSyncer swallows such exception by failing all writes with txid<=txidToSync,
and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset
to false in the IOException handling block, otherwise it can't be selected by AsyncWriter
to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message