accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <>
Subject Re: unique list of columns
Date Tue, 08 Apr 2014 23:24:39 GMT
I am trying to print out the histogram with that command but get the usage
message instead.  --dump option is working fine.   I'm on Accumulo 1.5.0

bin/accumulo $PACKAGE.PrintInfo --histogram

Usage: org.apache.accumulo.core.file.rfile.PrintInfo [options]  <file> {
<file> ... }


    -d, --dump

       dump the key/value pairs

       Default: false

    -h, -?, --help, -help

       Default: false


       print a histogram of the key-value sizes

       Default: false

Unknown option: --histogram

On Sat, Feb 22, 2014 at 8:47 AM, Mike Drob <> wrote:

> There's not a single good way that I am aware of, but there are a couple
> ways that will get you close.
> First, you can use the SortedKeyIterator to truncate values and
> potentially save yourself a lot of data transfer.
> Second, each RFile header block will track the columns contained, up to
> 1000 (possibly configurable). Check out PrintInfo[1].
> Mike
> [1]:
> On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan <>wrote:
>> I don't know the inner workings of the Rfiles enough but I was wondering
>> if there is a faster way to get a unique list of columns in Accumulo (short
>> of doing a full mapreduce).  Is there some way to skip ahead all the
>> volumes and just get to the next column?
>> Thanks

View raw message