hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-3099) optimization for log splitting (theory/suggestion)
Date Wed, 16 Jul 2014 23:11:07 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell resolved HBASE-3099.

    Resolution: Not a Problem

Probably superseded by distributed log splitting

> optimization for log splitting (theory/suggestion)
> --------------------------------------------------
>                 Key: HBASE-3099
>                 URL: https://issues.apache.org/jira/browse/HBASE-3099
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
> Right now log splitting is slower than we'd like.  The slow pace of log splitting is
one of the reasons why we have to keep a short, bounded, limit of the outstanding log files.
 It would be nice to up that limit, to allow perhaps hundreds of logs.  It would increase
efficiency because we would not be force-flushing regions at non-ideal sizes.
> But more data means more to process.  Except that not all of the logs for a regionserver
are actually useful.  This is because some regions got flushed before the oldest log was trimmed.
 So during log recovery if we read the most recent sequenceid, we could skip, during log splitting
(in the master), those entries and avoid writing them to the per-region log recovery.  It
would reduce the IO by part, and if our serialization/deser code was clever we might be able
to avoid deserializing much.  
> It's not clear how effective or worthwhile this might be.

This message was sent by Atlassian JIRA

View raw message