lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Open Relevance Project?
Date Wed, 13 May 2009 18:48:43 GMT
Crawling a reference dataset requires essentially one-time bandwidth.

Also, it is possible to download, say, wikipedia in a single go.  Likewise
there are various web-crawls that are available for research purposes (I
think).  See http://webascorpus.org/ for one example.  These would be single
downloads.

I don't entirely see the point of redoing the spidering.

On Wed, May 13, 2009 at 10:56 AM, Grant Ingersoll <gsingers@apache.org>wrote:

> Good point, although you never know.  We also will have some bandwidth reqs
> for crawling.
>
>


-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message