Folks, I am trying to get organized to get my feet wet in using the ability of accumulo to compute near the data. I beg your pardon in advance for the following exercise in laying  out what I have in mind and asking for some pointers -- particularly to examples on the 1.4 branch of code that I could warp to achieve my nefarious purposes. So, start with this data model:   ROWID   CF          CQ            V   itemid  'context'   dimension     value   itemid  something   else          entirely... In short, for an 'item', there's a sparse feature vector associated with it (identified by cf='context'), and some other things. Meanwhile, in another table we have:   clusterid  'items'  itemid1       -blank-   clusterid  'items'  itemid2       -blank- In other words, a cluster is a grouping of the items from the first group, identified by their rowids. My initial test of my ability to find my way around a brightly lit room with a flashlight is to calculate the centrolds of these clusters, and store them as an additional CF:     CF='centroid' CQ=dimension V=value And the my second test is to calculate the distance from each item to the centroid of it's cluster, and store that. Finally, I want to peruse items in descending order of their distance-from-centroid values. TIA