accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: unique list of columns
Date Sun, 23 Feb 2014 03:12:59 GMT
I can't help but wonder if maybe the problem you're trying to solve
could be done in a different way (like, when your RFiles are
generated). What kinds of things are your trying to do with the
enumeration of columns? Because, if you're trying to do something like
show these in a drop-down box in a web interface or something, these
could potentially be quite exhaustive... too big for even one machine
to handle, in the general case. Except in very specific use cases, I
can't imagine enumerating every column would be very useful. Perhaps
yours is such a use case, but I wonder...

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Sat, Feb 22, 2014 at 3:32 PM, Arshak Navruzyan <arshakn@gmail.com> wrote:
> Mike,
>
> Thanks; this sounds promising.
>
> Arshak
>
> On Feb 22, 2014 11:48 AM, "Mike Drob" <madrob@cloudera.com> wrote:
>>
>> There's not a single good way that I am aware of, but there are a couple
>> ways that will get you close.
>>
>> First, you can use the SortedKeyIterator to truncate values and
>> potentially save yourself a lot of data transfer.
>> Second, each RFile header block will track the columns contained, up to
>> 1000 (possibly configurable). Check out PrintInfo[1].
>>
>> Mike
>>
>> [1]:
>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java
>>
>>
>> On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan <arshakn@gmail.com>
>> wrote:
>>>
>>> I don't know the inner workings of the Rfiles enough but I was wondering
>>> if there is a faster way to get a unique list of columns in Accumulo (short
>>> of doing a full mapreduce).  Is there some way to skip ahead all the volumes
>>> and just get to the next column?
>>>
>>> Thanks
>>
>>
>

Mime
View raw message