hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Mon, 17 Nov 2014 20:27:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215113#comment-14215113

stack commented on HBASE-10201:

bq. We need to change protobuf definition

We could add extra fields in pb and write to two places for the life of an hbase version to
support rolling upgrade.

I hope you do not mind me surfacing here questions asked off list -- its best to keep the
discussion up here rather than off-list so others can participate too. 

You described off-list how the distributed log replay opens a region and puts the highest
*sequenceid* found up in zk and then uses this to figure which edits to replay. You also talk
of how regionServerReport includes the last flush id of each region we carry and that the
master keeps this around so on log replay we can skip edits already flushed. You then ask:

bq. I think I need to change all these places to use a map which stored familyName->maxSeqId
instead of a single SeqId. Am I right?

The sequenceid is *region-scoped*: i.e. we keep a running sequenceid per region. For the above
to work out, we'd need to change the sequenceid scope to be instead column-family rather than
region.  Since our memstore is by column family, and since the memstore now uses the region
sequenceid as its MVCC, this might be a good direction to go in but it is not what we have

You cannot have it so there are discontinuities in the progress of the flush sequenceid. If
four column families, the edits can go in to any of the four families in any order. 

You could do something like [~gaurav.menghani] did (See https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
suggests above where rather than report on successful flush, the highest sequenceid of all
a regions' memstores involved in a flush, instead, when you flush a column family only, you'd
have to report one less than the oldest outstanding edit still alive up in a column family

What if you did something much less involved; when there is pressure to flush, flush the stores
with the oldest edits until you've freed enough memory?

Upsides are that you'd clear out old edits from memory and we might let go of WALs a little
faster.  Also, you might not flush all of the content in a region -- because flushing just
a few stores might be enough to get you back under the threshold -- so we might make less
small storefiles?

Downsides are we'd make some small storefiles (e.g. for those stores that have a few old edits
in them and little else) and we'd do the flush in series rather than in //.  Because of sequenceid
accounting, we might replay more edits than we have to.

> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch,
HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch,
HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch,
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.

This message was sent by Atlassian JIRA

View raw message