hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs
Date Sun, 08 Jun 2014 21:41:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell updated HBASE-5140:

       Resolution: Won't Fix
    Fix Version/s:     (was: 0.90.4)
           Status: Resolved  (was: Patch Available)

Stale issue. Reopen if still relevant.

> TableInputFormat subclass to allow N number of splits per region during MR jobs
> -------------------------------------------------------------------------------
>                 Key: HBASE-5140
>                 URL: https://issues.apache.org/jira/browse/HBASE-5140
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Josh Wymer
>            Priority: Trivial
>              Labels: mapreduce, split
>         Attachments: Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch,
Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch.1, Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
>   Original Estimate: 72h
>  Remaining Estimate: 72h
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working
on a patch for the TableInputFormat class that overrides getSplits in order to generate N
number of splits per regions and/or N number of splits per job. The idea is to convert the
startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide
by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys
are fully distributed this should generate splits at nearly the same number of rows per split.
Any suggestions on this issue are welcome.

This message was sent by Atlassian JIRA

View raw message