accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <>
Subject Re: accumulo for a bi-map?
Date Wed, 17 Jul 2013 19:03:35 GMT

You might also want to check out D4M and the table organization that it
uses in Accumulo. D4M stores matrixes and their transforms, which is
essentially the same concept as a bidirectional map or a bidirected graph:


On Tue, Jul 16, 2013 at 5:28 PM, Marc Reichman <
> wrote:

> We are using accumulo as a mechanism to store feature data (binary byte[])
> for some simple keys which are used for a search algorithm. We currently
> search by iterating over the feature space using AccumuloRowInputFormat.
> Results come out of a reducer into HDFS, currently in a SequenceFile.
> A customer has asked if we can store our results somewhere in our Hadoop
> infrastructure, and also perform nightly searches of everything vs
> everything to keep match results up to date.
> To me, the storage of the results in alternate column families (from the
> features) would be a way way to store the matches alongside the key rows:
> (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
> 'mN': etc }
> (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
> 'mN': etc }
> Match scores are equal between two items regardless of perspective, so
> a->b is 90% as b->a is 90%.
> Is there a way to simply add columns to an existing family without having
> to name them or keep track of how many there are? Am I better off making a
> column family for each match key and then store score and other fields in
> columns? Making one column with the key as the name and the score as the
> value for each match under one family?
> Ideally I would have some form of bidirectional map so I could look at any
> key and find all the results as other keys, and find any results to get
> other matches.
> One approach is to simply add both sides of the relationship every time
> anything matches anything else, which seems a bit wasteful, space-wise.
> Curious if any pre-existing ideas are out there. Currently on hadoop
> 1.0.3/accumulo 1.4.1, not set in (hard) concrete.
> Thanks,
> Marc

View raw message