crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-267) Fix several HFileUtils#scanHFiles related problems
Date Thu, 19 Sep 2013 02:16:52 GMT


Josh Wills commented on CRUNCH-267:

Looks good Chao-- +1.
> Fix several HFileUtils#scanHFiles related problems
> --------------------------------------------------
>                 Key: CRUNCH-267
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Chao Shi
>         Attachments: crunch-267.patch
> This patch fixes several problems about HFileUtils#scanHFiles that are discovered on
our production cluster.
> 1. The usage of "" is wrong
> Returning -1 indicating all KVs in the HFile is greater than the given key, so we should
continue to scan. So I replaced it with seekAtOrAfter, which is copied from HBase code, and
added a few tests (testScanFiles_startRow{IsTooSmall, IsTooLarge, DoesNotExist) to cover this.
> 2. The default implementation of HFileSource#getSize does not estimate correctly the
size of input, if the input HFiles are in sub-directory (i.e. input/family/hfile)
> 3. There are some tricky cases about Delete/DeleteColumn. I added some test cases and
fix related code. (Hopefully my test case can cover this.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message