accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: accumulo for a bi-map?
Date Tue, 16 Jul 2013 22:55:48 GMT
Is disk space a consideration?
On Jul 16, 2013 2:28 PM, "Marc Reichman" <>

> We are using accumulo as a mechanism to store feature data (binary byte[])
> for some simple keys which are used for a search algorithm. We currently
> search by iterating over the feature space using AccumuloRowInputFormat.
> Results come out of a reducer into HDFS, currently in a SequenceFile.
> A customer has asked if we can store our results somewhere in our Hadoop
> infrastructure, and also perform nightly searches of everything vs
> everything to keep match results up to date.
> To me, the storage of the results in alternate column families (from the
> features) would be a way way to store the matches alongside the key rows:
> (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
> 'mN': etc }
> (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
> 'mN': etc }
> Match scores are equal between two items regardless of perspective, so
> a->b is 90% as b->a is 90%.
> Is there a way to simply add columns to an existing family without having
> to name them or keep track of how many there are? Am I better off making a
> column family for each match key and then store score and other fields in
> columns? Making one column with the key as the name and the score as the
> value for each match under one family?
> Ideally I would have some form of bidirectional map so I could look at any
> key and find all the results as other keys, and find any results to get
> other matches.
> One approach is to simply add both sides of the relationship every time
> anything matches anything else, which seems a bit wasteful, space-wise.
> Curious if any pre-existing ideas are out there. Currently on hadoop
> 1.0.3/accumulo 1.4.1, not set in (hard) concrete.
> Thanks,
> Marc

View raw message