accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: unique list of columns
Date Wed, 09 Apr 2014 00:34:16 GMT
Apparently does, thanks!



On Tue, Apr 8, 2014 at 5:21 PM, Billie Rinaldi <billie.rinaldi@gmail.com>wrote:

> Does this imply that the histogram option works in 1.5.0 as long as you
> spell it "historgram"?
>
>
> On Tue, Apr 8, 2014 at 4:54 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>> Arshak,
>>
>> Looks like that was a bug against 1.5.0 and fixed in 1.5.1.
>>
>> https://issues.apache.org/jira/browse/ACCUMULO-1571
>>
>>
>> On 4/8/14, 7:24 PM, Arshak Navruzyan wrote:
>>
>>> I am trying to print out the histogram with that command but get the
>>> usage message instead.  --dump option is working fine.   I'm on Accumulo
>>> 1.5.0
>>>
>>> PACKAGE=org.apache.accumulo.core.file.rfile
>>> bin/accumulo $PACKAGE.PrintInfo --histogram
>>> /accumulo/tables/53/t-0003371/A0003jbg.rf
>>>
>>> Usage: org.apache.accumulo.core.file.rfile.PrintInfo [options]  <file>
{
>>> <file> ... }
>>>
>>>    Options:
>>>
>>>      -d, --dump
>>>
>>>         dump the key/value pairs
>>>
>>>         Default: false
>>>
>>>      -h, -?, --help, -help
>>>
>>>         Default: false
>>>
>>>          --historgram
>>>
>>>         print a histogram of the key-value sizes
>>>
>>>         Default: false
>>>
>>>
>>> Unknown option: --histogram
>>>
>>>
>>>
>>> On Sat, Feb 22, 2014 at 8:47 AM, Mike Drob <madrob@cloudera.com
>>> <mailto:madrob@cloudera.com>> wrote:
>>>
>>>     There's not a single good way that I am aware of, but there are a
>>>     couple ways that will get you close.
>>>
>>>     First, you can use the SortedKeyIterator to truncate values and
>>>     potentially save yourself a lot of data transfer.
>>>     Second, each RFile header block will track the columns contained, up
>>>     to 1000 (possibly configurable). Check out PrintInfo[1].
>>>
>>>     Mike
>>>
>>>     [1]:
>>>     https://github.com/apache/accumulo/blob/master/core/src/
>>> main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java
>>>
>>>
>>>     On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan
>>>     <arshakn@gmail.com <mailto:arshakn@gmail.com>> wrote:
>>>
>>>         I don't know the inner workings of the Rfiles enough but I was
>>>         wondering if there is a faster way to get a unique list of
>>>         columns in Accumulo (short of doing a full mapreduce).  Is there
>>>         some way to skip ahead all the volumes and just get to the next
>>>         column?
>>>
>>>         Thanks
>>>
>>>
>>>
>>>
>

Mime
View raw message