hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11900) Optimization for incremental load reducer
Date Fri, 05 Sep 2014 03:39:25 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Anoop Sam John updated HBASE-11900:
    Component/s: mapreduce

> Optimization for incremental load reducer
> -----------------------------------------
>                 Key: HBASE-11900
>                 URL: https://issues.apache.org/jira/browse/HBASE-11900
>             Project: HBase
>          Issue Type: Improvement
>          Components: HFile, mapreduce
>    Affects Versions: 0.98.6
>            Reporter: Yi Deng
>            Priority: Minor
> In current implementation, the key of reducer configured by HFileOutputFormat.configureIncrementalLoad,
is row. So, the reducer has to do an in-memory sort before writing key values to the disk.
 When we meet with some rows with a huge number of comlumns/versions, there could be OOM.
> A better way is:
> Use the KeyValue as the key, value can be a NullWritable. Partioner partitions the KeyValue
only by it's row part. Set a sort comparator that sort KeyValue with KeyValue.COMPARATOR

This message was sent by Atlassian JIRA

View raw message