hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5930) Periodically flush the Memstore?
Date Mon, 28 Jan 2013 07:29:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564097#comment-13564097
] 

Lars Hofhansl commented on HBASE-5930:
--------------------------------------

Hmm... This is a bit more difficult than I thought.

I think what we want to limit is this: The maximum time an unflushed edit will remain in the
memstore. Otherwise one could trickle in edit 1 every hour and get very old data in the memstore.
(Doing that could potentially also be cheaper as we do not need to retrieve the current time
on each edit, just the first one after a flush).

If that is true, then what we want track is not the time of the newest edit, but the time
of oldest unflushed edit, and flush if that gets too old.
In order to avoid flushing all memstores at the same time, we want to offset the memstores
flush times.
We can do it the way you have it.
(but it seems natural to me to do that at the place where we detect that the memstore needs
to be flushed. For this to work the chore needs to wake up more frequently than the flush
interval.)

Btw. the flush interval you have a 10mins, not 1h.

                
> Periodically flush the Memstore?
> --------------------------------
>
>                 Key: HBASE-5930
>                 URL: https://issues.apache.org/jira/browse/HBASE-5930
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Devaraj Das
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5930-1.patch, 5930-2.1.patch, 5930-wip.patch
>
>
> A colleague of mine ran into an interesting issue.
> He inserted some data with the WAL disabled, which happened to fit in the aggregate Memstores
memory.
> Two weeks later he a had problem with the HDFS cluster, which caused the region servers
to abort. He found that his data was lost. Looking at the log we found that the Memstores
were not flushed at all during these two weeks.
> Should we have an option to flush memstores periodically. There are obvious downsides
to this, like many small storefiles, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message