hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-7743) KeyValueSortReducer and PutSortReducers buffer entire value-groups in memory
Date Sat, 02 Feb 2013 01:11:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Nick Dimiduk updated HBASE-7743:

    Component/s: Performance
> KeyValueSortReducer and PutSortReducers buffer entire value-groups in memory
> ----------------------------------------------------------------------------
>                 Key: HBASE-7743
>                 URL: https://issues.apache.org/jira/browse/HBASE-7743
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
> The mapreduce package provides two Reducer implementations, KeyValueSortReducer and PutSortReducer,
which are used by Import, ImportTsv, and WALPlayer in conjunction with the HFileOutputFormat.
Both of these implementations make use of a TreeSet to sort values matching a key. This reducer
will OOM when rows are large.
> A better solution would be to implement secondary sort of the values. That way hadoop
sorts the records, spilling to disk when necessary.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message