accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Kepner <>
Subject Re: accumulo for a bi-map?
Date Thu, 18 Jul 2013 17:32:25 GMT
Here is a link to the IEEE HPEC paper we wrote up on our schema work:

On Wed, Jul 17, 2013 at 03:03:35PM -0400, Adam Fuchs wrote:
>    Marc,
>    You might also want to check out D4M and the table organization that it
>    uses in Accumulo. D4M stores matrixes and their transforms, which is
>    essentially the same concept as a bidirectional map or a bidirected
>    graph: [1]
>    Cheers,
>    Adam
>    On Tue, Jul 16, 2013 at 5:28 PM, Marc Reichman
>    <[2]> wrote:
>      We are using accumulo as a mechanism to store feature data (binary
>      byte[]) for some simple keys which are used for a search algorithm. We
>      currently search by iterating over the feature space using
>      AccumuloRowInputFormat. Results come out of a reducer into HDFS,
>      currently in a SequenceFile.
>      A customer has asked if we can store our results somewhere in our Hadoop
>      infrastructure, and also perform nightly searches of everything vs
>      everything to keep match results up to date.
>      To me, the storage of the results in alternate column families (from the
>      features) would be a way way to store the matches alongside the key
>      rows:
>      (key: abcd, features:{...}, matches{ 'm0: efgh-88%, 'm1': ijkl-90%, ...,
>      'mN': etc }
>      (key: ijkl, features:{...}, matches{ 'm0: efgh-88%, 'm1': abcd-90%, ...,
>      'mN': etc }
>      Match scores are equal between two items regardless of perspective, so
>      a->b is 90% as b->a is 90%.
>      Is there a way to simply add columns to an existing family without
>      having to name them or keep track of how many there are? Am I better off
>      making a column family for each match key and then store score and other
>      fields in columns? Making one column with the key as the name and the
>      score as the value for each match under one family?
>      Ideally I would have some form of bidirectional map so I could look at
>      any key and find all the results as other keys, and find any results to
>      get other matches.
>      One approach is to simply add both sides of the relationship every time
>      anything matches anything else, which seems a bit wasteful, space-wise.
>      Curious if any pre-existing ideas are out there. Currently on hadoop
>      1.0.3/accumulo 1.4.1, not set in (hard) concrete.
>      Thanks,
>      Marc
> References
>    Visible links
>    1.
>    2.

View raw message