crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Roling (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-614) HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying KeyValue byte array
Date Fri, 29 Jul 2016 03:59:20 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben Roling updated CRUNCH-614:
------------------------------
    Attachment: CRUNCH-614-1.patch

Attaching the fairly obvious patch to use the KeyValue(byte[], int, int) constructor in the
KeyValueComparator.compare() methods.

The integration tests pass and the change has the expected effect of dramatically speeding
up the job that originally caused me to look into this issue.  The task that was taking hours
before completes in under a minute.

I'm not sure if I should have changed the implementation of HBaseTypes.bytesToKeyValue() or
changed any of the other places that use that method?

On the mailing list [~joshwills] said the following:
{quote}
...it looks like we were consolidating some common patterns in the code that had different
use cases (i.e., defensive copies on reads vs. not doing that on sorts.)...
{quote}

If defensive copies on reads are desired or required then perhaps the other code shouldn't
be touched.  The other uses of bytesToKeyValue() are in the PTypes defined by HBaseTypes.cells()
and HBaseTypes.keyValues().

Josh (or others) - do you have any more feedback?

> HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying KeyValue byte
array
> ---------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-614
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-614
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0
>            Reporter: Ben Roling
>         Attachments: CRUNCH-614-1.patch
>
>
> I raised this issue on the mailing list:
> http://mail-archives.apache.org/mod_mbox/crunch-user/201607.mbox/%3CCANBdsh01qaQRCNdQdtqytP%2BWAhT_NVGHyQAdDS8H%2BPPMfi9bkw%40mail.gmail.com%3E
> HFileUtils was changed in such a way that it makes a copy of the KeyValue byte array
in the compare() method of the KeyValueComparator.  The change was made with the following
commit:
> https://github.com/apache/crunch/commit/a959ee6c7fc400d1f455b0742641c54de1dec0bf#diff-bc76ce0b41704c9c4efbfa1aab53588d
> The change causes HFileUtils.writeToHFilesForIncrementalLoad to be dramatically slower
in at least some cases.
> The code changed from using the KeyValue(byte[], int, int) constructor to using KeyValue.create().
 KeyValue.create() does a byte array copy.  The fix is likely as simple as changing the code
back to using the KeyValue constructor.
> I will do some testing an attach a PR for the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message