lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jochen Frey" <jochen_f...@yahoo.com>
Subject FW: Indexing Speed: Documents vs. Sentences
Date Fri, 19 Dec 2003 17:05:51 GMT

Stephane,

	The actual indexing is actually less glamorous than it sounds. When
you index 1TB across 10 machines you end up with 100GB on each machine. We
do not merge the indexes either, since we get better speed on indexing as
well as querying when we keep indexes smaller and distributed across
different machines. (But somehow I think that I'll sit down and merge all of
them together and play with it when I get a chance ... 'cause it's cool :-)
I'll keep you posted when it happens).
 
	My test set that I am playing with is 40GB, and I just posted a
benchmark.
 
	Best,
		Jochen

> -----Original Message-----
> From: Stephane Vaucher [mailto:vauchers@cirano.qc.ca]
> Sent: Thursday, December 18, 2003 9:01 AM
> To: Lucene Users List; jochen_frey@yahoo.com
> Subject: RE: Indexing Speed: Documents vs. Sentences
> 
> Jochen,
> 
> If you have a bit of time, could you post some metrics, (as an example,
> you can look at http://jakarta.apache.org/lucene/docs/benchmarks.html). I
> haven't heard of anyone indexing 1TB yet. I'm sure everyone is interested
> in problems you could be facing and we could probably give you some ideas.
> I know (oddly enough) I sometimes wish I had dataset greater than a few M
> docs to experiment with.
> 
> cheers,
> sv
> 
> On Thu, 18 Dec 2003, Jochen Frey wrote:
> 
> > Hi,
> >
> > 	Yes, this is correct, I am dealing with a few 100GB (close to 1TB).
> > I am, however, distributing the data across several machines and then
> merge
> > the results from all the machines together (until I find a better &
> faster
> > solution).
> >
> > 	Cheers!
> >
> > > -----Original Message-----
> > > From: Victor Hadianto [mailto:vichad@hadianto.net]
> > > Sent: Wednesday, December 17, 2003 10:50 PM
> > > To: Lucene Users List
> > > Subject: Re: Indexing Speed: Documents vs. Sentences
> > >
> > > > Hi,
> > > >
> > > > I am using Lucene to index a large number of web pages (a few 100GB)
> and
> > > the
> > > > indexing speed is great.
> > >
> > > Jochen .. a few 100 GB? Is this correct?
> > >
> > > /victor
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message