lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: The best way forward
Date Tue, 04 Nov 2003 18:17:19 GMT
On Tue, Nov 04, 2003 at 01:16:30PM +0100, petite_abeille wrote:
> 
> On Nov 04, 2003, at 13:04, Otis Gospodnetic wrote:
> 
> >>Eventually i am going to try to implement something similar to google
> >>groups, indexing lots of NNTP traffic. Has anyone done this before
> >>with lucune?
> >
> >Not that I know, but people have used Lucene to index their email,
> >which is somewhat similar.

Kind of similar in that a news item and an email have very similar
structure. 

On the other hand trying to replicate google news is a major project.
You need to run a news server which last time I looked wasn't a small
task, and then you need to have enough storage, enough bandwidth and
enough disk space to hold all of this.

Do you know how many daily articles there are in a daily feed? What is
the average size of a news item? Probably similar to email.

> 
> Very similar indeed :)
> 
> Perhaps you should take a look at ZOE:
> 
> http://zoe.nu/
> 
> It uses Lucene quiet extensively to index emails type of things.
> 
> NNTP support could be a stone throw away as you would only need to 
> plugin the appropriate JavaMail's Store to handle NNTP specifics.
> 
> On the other hand, I doubt that anyone has tried to index anything on 
> the scale of Google's data set... NNTP or not :)
> 
> Cheers,
> 
> PA.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message