hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3099) optimization for log splitting (theory/suggestion)
Date Mon, 11 Oct 2010 18:23:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919933#action_12919933

Kannan Muthukkaruppan commented on HBASE-3099:


> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>                 Key: HBASE-3099
>                 URL: https://issues.apache.org/jira/browse/HBASE-3099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
> Right now log splitting is slower than we'd like.  The slow pace of log splitting is
one of the reasons why we have to keep a short, bounded, limit of the outstanding log files.
 It would be nice to up that limit, to allow perhaps hundreds of logs.  It would increase
efficiency because we would not be force-flushing regions at non-ideal sizes.
> But more data means more to process.  Except that not all of the logs for a regionserver
are actually useful.  This is because some regions got flushed before the oldest log was trimmed.
 So during log recovery if we read the most recent sequenceid, we could skip, during log splitting
(in the master), those entries and avoid writing them to the per-region log recovery.  It
would reduce the IO by part, and if our serialization/deser code was clever we might be able
to avoid deserializing much.  
> It's not clear how effective or worthwhile this might be.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message