hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7743) Replace *SortReducers with Hadoop Secondary Sort
Date Wed, 16 Oct 2013 18:51:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797120#comment-13797120

Nick Dimiduk commented on HBASE-7743:

bq. Having reducers group all cells per row will not handle very large rows very well.

That's precisely the reason I stumbled into this in the first place :)

> Replace *SortReducers with Hadoop Secondary Sort
> ------------------------------------------------
>                 Key: HBASE-7743
>                 URL: https://issues.apache.org/jira/browse/HBASE-7743
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
> The mapreduce package provides two Reducer implementations, KeyValueSortReducer and PutSortReducer,
which are used by Import, ImportTsv, and WALPlayer in conjunction with the HFileOutputFormat.
Both of these implementations make use of a TreeSet to sort values matching a key. This reducer
will OOM when rows are large.
> A better solution would be to implement secondary sort of the values. That way hadoop
sorts the records, spilling to disk when necessary.

This message was sent by Atlassian JIRA

View raw message