accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <>
Subject Writing an iterator that calculates on compaction
Date Fri, 02 Mar 2012 20:59:34 GMT

I am trying to get organized to get my feet wet in using the ability
of accumulo to compute near the data. I beg your pardon in advance for
the following exercise in laying  out what I have in mind and asking
for some pointers -- particularly to examples on the 1.4 branch of
code that I could warp to achieve my nefarious purposes.

So, start with this data model:

  ROWID   CF          CQ            V
  itemid  'context'   dimension     value
  itemid  something   else          entirely...

In short, for an 'item', there's a sparse feature vector associated
with it (identified by cf='context'), and some other things.

Meanwhile, in another table we have:

  clusterid  'items'  itemid1       -blank-
  clusterid  'items'  itemid2       -blank-

In other words, a cluster is a grouping of the items from the first
group, identified by their rowids.

My initial test of my ability to find my way around a brightly lit
room with a flashlight is to calculate the centrolds of these
clusters, and store them as an additional CF:

    CF='centroid' CQ=dimension V=value

And the my second test is to calculate the distance from each item to
the centroid of it's cluster, and store that. Finally, I want to
peruse items in descending order of their distance-from-centroid


View raw message