lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clemens Marschner" <c...@lanlab.de>
Subject Re: Compressing Links / Was: Re: Web Crawler
Date Wed, 24 Apr 2002 22:43:09 GMT
> see this:http://www.almaden.ibm.com/cs/k53/www9.final/
>
> . "In CS2, each URL is stored in 10 bytes. In CS1, each link requires 8
bytes to store as both an in-link and out-link; in CS2, an average of only
3.4 bytes are used. Second, CS2 provides additional functionality in the
form of a host database. For example, in CS2, it is easy to get all the
in-links for a given node, or just the in-links from remote hosts.
>

Hm, this is very promising. I wonder if they can do it without a perfect
hash function.

Thanks for the link. I stepped over it some time ago, but already forgot
that part. The authors are right: "The study of the web as a graph is not
only fascinating in its own right, but also yields valuable insight into web
algorithms for crawling, searching and community discovery, and the
sociological phenomena which characterize its evolution"

Clemens


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message