hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15773) CellCounter improvements
Date Fri, 06 May 2016 18:27:13 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gary Helmling updated HBASE-15773:
----------------------------------
      Resolution: Fixed
    Release Note: 
The CellCounter map reduce job now supports additional configuration options on the Scan instance
it creates, using the org.apache.hadoop.hbase.mapreduce.TableInputFormat defined property
names.  For a full list of the options, run ./hbase org.apache.hadoop.hbase.mapreduce.CellCounter
with no arguments.

CellCounter also no longer creates job counters for per-rowkey and per-rowkey/qualifier cell
counts.  For most tables, these counters would cause the job to fail due to mapreduce job
counter limits.
          Status: Resolved  (was: Patch Available)

Committed to branch-1.3+.  Thanks for reviews, [~enis] and [~mantonov].

> CellCounter improvements
> ------------------------
>
>                 Key: HBASE-15773
>                 URL: https://issues.apache.org/jira/browse/HBASE-15773
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>             Fix For: 1.3.0
>
>         Attachments: HBASE-15773.001.patch, HBASE-15773.002.patch
>
>
> Looking at the CellCounter map reduce, it seems like it can be improved in a few areas:
> * it does not currently support setting scan batching.  This is important when we're
fetching all versions for columns.  Actually, it would be nice to support all of the scan
configuration currently provided in TableInputFormat.
> * generating job counters containing row keys and column qualifiers is guaranteed to
blow up on anything but the smallest table.  This is not usable and doesn't make any sense
when the same counts are in the job output.  The row and qualifier specific counters should
be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message