(1) I abandoned any attempts at DBScan and implemented another density
algorithm itself (can't say which, subject to patent restrictions). The
reason being, i couldn't immediately figure how to parallelize it
efficiently (aside from data structure discussions), the base algorithm is
inherently iterative.
(2) Samsara provides Rbase level algebra, not general data structure
support. IMO it would not pay to adjust it to do that any more than trying
to fit R base to do it. I did implement spatial structures standardized on
using Samsara in terms of carrying out computations (mostly inmemory), but
those were still data structures in their on right.
(3) like i said before, experience tells me that using collection of 2d
tensors (esp. sparse tensors) is fine instead of trying to introduce nd
tensor. The fundamental difference here is mostly with sparse operations
where nd sparse tensor could intelligently execute those. But this is not
supported properly pretty much by any library i know, so all the difference
in most cases that say they support it directly is just putting a tensor
api over collection of tensors. Practicality of having dense nd tensors is
also a bit questioned, since it immediately increases requirements for
processing memory quantity of a single tensor instance, whereas collection
of tensors could be represented as retaining streaming properties. etc.
etc. Overall it sounds to me like a solution in a search of a problem
(given my admittedly very limited experience as a practitioner in math).
On Wed, Jul 5, 2017 at 12:09 AM, Aditya <adityasarma007@gmail.com> wrote:
> ***Important** **Do read** *
> Hello everyone,
>
> Trevor and I have been discussing as to how to effectively represent an
> RTree in Mahout. Turns out there is a method to represent a Binary Search
> Tree (BST) in the form of an ancestor matrix. This
> <http://www.geeksforgeeks.org/constructancestormatrix
> fromagivenbinarytree/>
> and this <http://www.geeksforgeeks.org/constructtreefromancestor
> matrix/>
> show the conversion logic from a tree representation to a matrix
> representation and vice versa.
>
> In a distributed scenario, I know of the following design
> <https://docs.google.com/document/d/1SfMIt8hYENwlm328rSGAMJcci6JM4
> PyXLoNlVF0Yno8/edit?usp=sharing>
> which is fairly efficient and intuitive to understand.
>
> Now the point for debate is this:
> **Please read the design provided in the link above before proceeding**
> The RTree will always be a local entity, no where in the algorithm is
> there a need / necessity to have a distributed RTree kind of scenario. On
> the other hand, the data points as well as the union find data structure
> need to be stored in a DRM like data structure and they very well can be
> represented in the form of a matrix. (A union find data structure basically
> can be implemented using a vector)
>
> 1. Why not build an RTree module in the form of a normal tree with two
> children and a keyvalue pair? (I'm not sure if this is allowed in Mahout,
> so veterans please chip in). Since an Rtree will always be an incore
> entity.
>
> 2. If 1. is not done, then the method followed for the matrix
> representation of a BST should be followed. Only, the elements and
> conditions will be changed. On an abstract sense, Matrix representation of
> an RTree and matrix representation of a Binary Search Tree are analogous.
> But in this case, the construction and search costs for the Matrix version
> of an RTree will be costlier.
>
>
> *PS: Shannon, Dmitry, Andrew P, Andrew M and Trevor, it'd be great if you
> could offer your insights.*
>
> Thanks,
> Aditya
>
>
> On Sat, Jun 24, 2017 at 3:41 AM, Trevor Grant <trevor.d.grant@gmail.com>
> wrote:
>
> > What if you had Arrays of Matrices, or Arrays of Arrays of Matrices?
> (e.g.
> > 3d and 4d tensors)?
> >
> > I implemented these for the MLPs (still WIP)
> >
> > https://github.com/apache/mahout/pull/323/files#diff
> > cd8a7c5e2cf42b91b5aa47c96daf19c0R25
> >
> > But those functions were specifically to overcome the challenges you
> > describe wrt neural network weight sets.
> >
> > If you could use those as is that would be awesome, if not maybe you'll
> > find inspiration there.
> >
> >
> >
> > On Thu, Jun 22, 2017 at 6:43 PM, Aditya <adityasarma007@gmail.com>
> wrote:
> >
> > > Hello everyone,
> > >
> > > I've been working for the past few weeks on coding an incore DBSCAN
> > > algorithm.
> > >
> > > A more efficient version with an O(n*log(n)) complexity does exist but
> it
> > > uses the RTree data structure to index the data. I have a few
> > > concerns/questions and I'm hoping you would be able to help me out.
> > >
> > > 1. Based on my knowledge, using an indexing data structure like an
> RTree
> > > or a KdTree is the only way to reduce the complexity of the dbscan
> > > algorithm. If there's any other method that you are familiar with,
> please
> > > let me know.
> > >
> > > 2. From what I've read in the book Apache Mahout: Beyond MapReduce
> > written
> > > by Andrew and Dmitry, I don't see how I can directly exploit the
> existing
> > > data structures and operations to get the functionality of an RTree.
> > >
> > > 3. On the off chance that an RTree module has to built in Mahout,
> > because
> > > it is not possible to easily use existing operations I need some
> insights
> > > as to how to go about it. I learned that everything in Mahout at the
> end
> > > should be serializable to a vector. I can't fathom how to do that
> > > intuitively in the case of an RTree
> > >
> > > There are a couple of other concerns that need to be discussed but
> these
> > > are vital at the moment.
> > >
> > > PS: The research paper that proposed the DBSCAN algorithm can be found
> > here
> > > <https://www.aaai.org/Papers/KDD/1996/KDD96037.pdf>.
> > >
> > > Thanks!
> > >
> > > Aditya
> > >
> >
>
