Hi Ted,
I am really interested in learning more about the symmetry in power
law/fractal graphs in general and how matrix math (SVD, etc.) allows us to
detect/exploit it. Would you have any recommendations for books or papers
that I could read which explores this question. I remember doing a standard
Linear Algebra course way back in college but it never discussed its links
to graphs/symmetry...
Thanks and best regards,
Dev
On Wed, Apr 16, 2008 at 2:33 AM, Ted Dunning <tdunning@veoh.com> wrote:
>
> Power law algorithms are ideal for this kind of parallelized problem.
>
> The basic idea is that hub and authority style algorithms are intimately
> related to eigenvector or singular value decompositions (depending on
> whether the links are symmetrical). This also means that there is a close
> relationship to asymptotic beahavior of random walks on the graph.
>
> If you represent the linkage in the web by a matrix that has columns
> representing source page and rows representing the target page and with a
> 1
> whereever the source page has a link pointing to the target page, then if
> you start with a vector with a single nonzero element equal to 1 as a
> representation of a single page, then multiplying by the linkage matrix
> will
> give you a vector with 1 in the positions corresponding to the pages the
> original page linked to. If you multiply again, you get all the pages
> that
> you can get to in two steps from the original page.
>
> Mathematically, if we call the original vector x and the linkage matrix A,
> the pages that x links to are just Ax. The pages that are two steps from
> x
> are A(Ax) = A^2 x.
>
> The eigenvector decomposition of A is just a way of writing A as a product
> of three matrices:
>
> A = U S U'
>
> U' is the transpose of U, and U has the special property that U'U = I (it
> is
> called orthonormal because of this).
>
> S is a diagonal matrix.
>
> There is lots of deep mathematical machinery and beautiful symmetry
> available here, but for now we can just take this as given.
>
> The pages n steps from x are
>
> x_n = A^n x = (U S U')^n x = (U S U')^n2 (U S U') (U S U') x
> = (U S U')^(n2) (U S (U'U) S U') x = (U S U')^(n2) (U S^2 U') x
> = U S^n U' x
>
> This is really cool because S^n can be computed by just taking each
> diagonal
> element and raising it to a power.
>
> Eigenvector decompositions have other, really deep connections. For
> instance, if you take the elements of S (call the ith one s_i) then
>
> sum_i s_i^n
>
> Is the number of paths that are n steps long.
>
> Connected (or nearly connected) clusters of pages can also be derived from
> the eigenvector decomposition. This is the basis of socalled spectral
> clustering. For some very impressive examples of spectral clustering see
>
> http://citeseer.ist.psu.edu/ng01spectral.html
>
> So eigenvectors are cool. But how can we compute them?
>
> First, note that if A^n = U S^n U' and if some of the s_i are bigger than
> others, the big ones will quickly dominate the others. That is pretty
> quickly, A^n \approx u_1 s_1^n u_1'. This means that we can compute an
> approximation of u_1 by just doing A^n x where x is some random vector.
> Moreover, we can compute u_2 by starting with a different random vector
> and
> iterating the same way, but with an additional step where we forbid the
> result from going towards u_1. With just a few additional wrinkles, this
> gives us what is called the Lanczos algorithm. Golub and van Loan's
> excellent book Matrix Computations gives a lot of information on these
> algorithms.
>
> The cool thing here is that our random vector can represent a single page
> and we can approximate the final result by following links. Following
> links
> is just a (humanreadable) way of saying sparse matrix multiplication. If
> we do this multiplication against lots of different random starting
> points,
> we can quickly build parallel algorithms to compute things like page rank.
>
> I hope this helps. I know it is a big byte to take at one time.
>
>
>
>
>
>
>
> On 4/15/08 9:31 AM, "Chaman Singh Verma" <csv610@yahoo.com> wrote:
>
> > Hello,
> >
> > After googling for many days, I couldn't get one answer from many of the
> > published reports on Ranking algorithm done by Google. Since Google uses
> > GFS for fault tolerance purposes, what communication libraries they
> might be
> > using to solve such a large matrix ? I presume that standard Message
> Passing
> > (MPI) may not be suitable for this purpose(Fault tolerance issue)
> >
> > Suppose we implement ranking algorithm on top of Hadoop, what could be
> > the best way/best distributed algorithm/library etc ?
> >
> > With regards.
> >
> > Chaman Singh Verma
> > Poona, India
> >
> >
> >
> >
> > between 00000000 and 99999999
>
>
