hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10017) HRegionPartitioner, rows directed to last partition are wrongly mapped.
Date Thu, 05 Dec 2013 01:57:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839681#comment-13839681

Enis Soztutar commented on HBASE-10017:

bq. I have reproduced data loss during bulk load. This happens under the same conditions as
initial bug. 16 regions per table, I think it's not the only case. Again, partitioner wrongly
maps last region data and resulting region HFile contains keys that shall not appear there.
This partitioner is not intended to be used by bulk load. It is already there in the javadoc.
 TotalOrderPartioner should be used instead. If there are changes to regions, LoadIncrementalFiles
checks the boundaries (although not sure whether it handles multiple splits to the same range
or merges). 

Other than that, the changes seems ok. However, I think we should get the region boundaries
at the start, and treat the range as immutable for the lifetime of the partitioner. Although
the table regions might go underlying changes, we can at least guarantee a consistent mapping
for key ranges. We can to a table.getStartKeys() and do a binary search for the key range
considering the special region boundaries (empty start and stop rows). 

> HRegionPartitioner, rows directed to last partition are wrongly mapped.
> -----------------------------------------------------------------------
>                 Key: HBASE-10017
>                 URL: https://issues.apache.org/jira/browse/HBASE-10017
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.94.6
>            Reporter: Roman Nikitchenko
>            Priority: Critical
>         Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, patchSiteOutput.txt
> Inside HRegionPartitioner class there is getPartition() method which should map first
numPartitions regions to appropriate partitions 1:1. But based on condition last region is
hashed which could lead to last reducer not having any data. This is considered serious issue.
> I reproduced this only starting from 16 regions per table. Original defect was found
in 0.94.6 but at least today's trunk and 0.91 branch head have the same HRegionPartitioner
code in this part which means the same issue.

This message was sent by Atlassian JIRA

View raw message