hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xing Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6195) Increment data will lost when the memstore flushed
Date Mon, 11 Jun 2012 03:13:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292625#comment-13292625
] 

Xing Shi commented on HBASE-6195:
---------------------------------

Here is the data:
I delete the row first, and then use 2000 threads to increment one row, each increment 1000,
after all threads done, I read the increment row's value, do 11 times.

for i in `seq 0 10`
do
    /home/shubao.sx/hadoop-0.20.2-cdh3u3/bin/hadoop --config /home/shubao.sx/0.90-hadoop-config
jar /home/shubao.sx/inc-no-delete/inc.jar com.taobao.hbase.MultiThreadsIncrement --threadNum
2000 --inc 1000 >/home/shubao.sx/inc-no-delete/inc.$i.log
done

and the results:

inc.0.log : return 199838                                                                
                                                 
inc.1.log : return 399729
inc.2.log : return 599579
inc.3.log : return 799441
inc.4.log : return 999305
inc.5.log : return 1199173
inc.6.log : return 1399037
inc.7.log : return 1598939
inc.8.log : return 1798804
inc.9.log : return 1998708
inc.10.log : return 2198637

Because I set the  hlog's parameter
  <property>
    <name>hbase.regionserver.logroll.multiplier</name>
    <value>0.005</value>
  </property>
  <property>
    <name>hbase.regionserver.maxlogs</name>
    <value>3</value>
  </property>

so the memstore flush occurs often.
                
> Increment data will lost when the memstore flushed
> --------------------------------------------------
>
>                 Key: HBASE-6195
>                 URL: https://issues.apache.org/jira/browse/HBASE-6195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Xing Shi
>
> There are two problems in increment() now:
> First:
> I see that the timestamp(the variable now) in HRegion's Increment() is generated before
got the rowLock, so when there are multi-thread increment the same row, although it generate
earlier, it may got the lock later. Because increment just store one version, so till now,
the result will still be right.
> When the region is flushing, these increment will read the kv from snapshot and memstore
with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp
larger than the memstore, the increment will got the old data and then do the increment, it's
wrong.
> Secondly:
> Also there is a risk in increment. Because it writes the memstore first and then HLog,
so if it writes HLog failed, the client will also read the incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message