lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Open Relevance Project?
Date Mon, 18 May 2009 02:18:23 GMT

Not sure if this was mentioned before, but .... hm, I was going to point out http://index.isc.org/
(see http://ioiblog.wordpress.com/2008/11/07/kicking-off-the-ioi-blog/ ), but the server doesn't
seem to be listening.... aha, here: http://ioiblog.wordpress.com/2009/02/

Perhaps we can get data from Dennis and Jeremie?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Ted Dunning <ted.dunning@gmail.com>
> To: general@lucene.apache.org
> Sent: Wednesday, May 13, 2009 2:48:43 PM
> Subject: Re: Open Relevance Project?
> 
> Crawling a reference dataset requires essentially one-time bandwidth.
> 
> Also, it is possible to download, say, wikipedia in a single go.  Likewise
> there are various web-crawls that are available for research purposes (I
> think).  See http://webascorpus.org/ for one example.  These would be single
> downloads.
> 
> I don't entirely see the point of redoing the spidering.
> 
> On Wed, May 13, 2009 at 10:56 AM, Grant Ingersoll wrote:
> 
> > Good point, although you never know.  We also will have some bandwidth reqs
> > for crawling.
> >
> >
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve


Mime
View raw message