hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3325) Optimize log splitter to not output obsolete edits
Date Fri, 10 Dec 2010 02:13:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970050#action_12970050
] 

Kannan Muthukkaruppan commented on HBASE-3325:
----------------------------------------------

It would be good to keep track of last flush seq id on a per-CF basis rather than per region
basis given that a some point we want to do per-CF flushes.

> Optimize log splitter to not output obsolete edits
> --------------------------------------------------
>
>                 Key: HBASE-3325
>                 URL: https://issues.apache.org/jira/browse/HBASE-3325
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>
> Currently when the master splits logs, it outputs all edits it finds, even those that
have already been obsoleted by flushes. At replay time on the RS we discard the edits that
have already been flushed.
> We could do a pretty simple optimization here - basically the RS should replicate a map
"region id -> last flushed seq id" into ZooKeeper (this can be asynchronous by some seconds
without any problems). Then when doing log splitting, if we have this map available, we can
discard any edits found in the logs that were already flushed, and thus output a much smaller
amount of data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message