crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-475) Compilation problem caused by KeyValue -> Cell conversion
Date Wed, 14 Jan 2015 14:35:34 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276957#comment-14276957
] 

Micah Whitacre commented on CRUNCH-475:
---------------------------------------

* Hadoop dependency goes from 2.2-> 2.4 but should we just go to 2.5? (not sure of passivity
between the versions)
* can remove hbase.midfix because looks to only be used for compat dependencies.
* HFileTargetIT/HFileUtils uses multi imports which as Gabriel pointed out in another issue
our non-existent code conventions discourage that.
* for the use cases where we are trying to pull the value out of the Cell should we use CellUtil.cloneValue()
instead of using the value array, length, and offset? (e.g. HFileTargetIT)
* Should add comment about why in HBaseTypes.keyValueToBytes we still try to convert the if
an IOException is thrown.
* HFileTargetIT creates Cells using KeyValue, but I think you should be able to use CellUtil.createCell(...)
instead of just depending on the KeyValue class.
* Implementation choice HFileInputFormat/HFileUtils you are doing custom byte comparison of
the row key/column family.  There is a method on CellComparator for doing that (though annotations
claim it is private)
* HFileUtils.EXTRACT_ROW_FN can make use of CellUtil.cloneRow(...) vs the Array.copyOfRange(...)
* can clean up this code a bit:
{code}
+    PCollection<Cell> kvs = puts.parallelDo("ConvertPutToKeyValue", new DoFn<Put,
Cell>() {
       @Override
-      public void process(Put input, Emitter<KeyValue> emitter) {
+      public void process(Put input, Emitter<Cell> emitter) {
         for (List<KeyValue> keyValues : input.getFamilyMap().values()) {
           for (KeyValue keyValue : keyValues) {
             emitter.emit(keyValue);
           }
         }
       }
{code}
** Should rename the parallel do to say Cell vs KeyValue.
** Can do input.getFamilyCellMap() to get iterable of Cells vs KeyValues.


> Compilation problem caused by KeyValue -> Cell conversion
> ---------------------------------------------------------
>
>                 Key: CRUNCH-475
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-475
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Lee Dongjin
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: CRUNCH-475.patch, CRUNCH-475.patch
>
>
> From hbase 0.99, Using KeyValue class for hbase I/O is deprecated and in many APIs it
was replaced with Cell interface[^1][^2][^3]. This change causes compilation error with hbase
0.99, which is the first hbase version that supports hadoop 2 only.
> For this change will be permanent from hbase 1.0 and on, it would be better to be fixed.
> [^1]: https://issues.apache.org/jira/browse/HBASE-11805
> [^2]: https://issues.apache.org/jira/browse/HBASE-9359
> [^3]: https://issues.apache.org/jira/browse/HBASE-10526



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message