drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Kishore (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1346) Use HBase table size information to improve scan parallelization
Date Fri, 29 Aug 2014 19:27:53 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aditya Kishore updated DRILL-1346:

    Attachment: 0001-DRILL-1346-Use-HBase-table-size-information-to-impro.patch

Patch also available at https://reviews.apache.org/r/25190/ for review.

> Use HBase table size information to improve scan parallelization
> ----------------------------------------------------------------
>                 Key: DRILL-1346
>                 URL: https://issues.apache.org/jira/browse/DRILL-1346
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - HBase
>    Affects Versions: 0.5.0
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>            Priority: Critical
>         Attachments: 0001-DRILL-1346-Use-HBase-table-size-information-to-impro.patch
> Currently we use a pseudo-estimated value to calculate the scan size which does not take
the actual size of data into account.
> HBase, through {{o.a.h.h.client.HBaseAdmin.getClusterStatus()}}, provides a way to retrieve
the actual data size of each region. We can use this to approximate the size of scan and use
it to improve the scan parallelization.

This message was sent by Atlassian JIRA

View raw message