accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: feedback on Typo
Date Tue, 14 Aug 2012 17:29:34 GMT
On Mon, Aug 13, 2012 at 6:03 PM, Josh Elser <> wrote:
> Even with something as simple as a pair, things can start getting difficult.
> I suppose it really revolves around the level of support you want to provide
> at scan time, e.g. "find all pairs where the second is 'x'?".

I implemented support for Pair and Triple.  Getting the tuples to sort
correctly lexicographically is tricky, which is why a library like
Typo is nice.  Below is a link to an example that uses Pair to store
an edge in the row of the Accumulo key.  The example scans over all
Pairs where the first is X.  This can be done efficiently by
leveraging the way Pair sorts.  Finding all pairs where the second is
X would require a full table scan.  One way to avoid this is to insert
the edge twice, insert Pair(X,Y) and Pair(Y,X), then you can find what
you are looking for w/o a full table scan.  I think this what you
mentioned below.

> Spending a few minutes thinking about it, an index could be a separate table
> but wouldn't necessarily have to be. It depends on the complexity of the
> structure you're trying to index. Using the Pair example again, you could
> reserve a column (family) to place index records in which simply inverts the
> Pair in the colqual.

Right, so you could use Typo to do this but it would not do it for you.

> On 08/13/2012 11:06 AM, Keith Turner wrote:
>> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<>  wrote:
>>> Neat idea, Keith.
>>> Have you thought about how to support more complex types? Specifically,
>>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>>> those complex types?
>> Yeah I was thinking that would be nice.  I see a lot of users putting
>> multiple types into the row and/or columns.  Could have something like
>> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
>> such that it sorts correctly.  However, this may be cumbersome to use
>> if you want to use different types.  For example I want a row composed
>> of a Long and String.  I was thinking of having the following types to
>> handle this case.
>> class Pair<A,B>  extends LexEncoder{
>>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>>     A getFirst(){}
>>     B getSecond(){}
>> }
>> class Triple<A,B,C>{//follows same pattern as Pair}
>> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>> This would allow a user to write code like the following that makes it
>> easy to work with a row composed of a Long and String.
>> Pair<Long, String>  pair;
>> long l = pair.getFirst();
>> String s = pair.getSecond();
>> I am still thinking the tuple concept through.
>> I was not considering indexing.  I assuming you mean creating an index
>> in another table?
>>> Initial thoughts are that it would make the most sense to place Typo at
>>> the
>>> contrib level (or something equivalent). The reason being: Typo doesn't
>>> change the underlying functionality of Accumulo; it only provides a layer
>>> on
>>> top of it that makes life easier for developers.
>> I think putting it in contrib makes sense.
>>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>>> I put together a simple abstraction layer for Accumulo that makes it
>>>> easier to read and write Java objects to Accumulo key and value
>>>> fields.  The data written to Accumulo sort correctly
>>>> lexicographically.
>>>> I put the code on github and would like some feedback on the design
>>>> and whether it should be included with Accumulo.
>>>> Its still a little rough and I need to add encoder for all of the
>>>> primitive types.
>>>> Keith

View raw message