Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 74114 invoked from network); 19 Dec 2003 17:05:56 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 19 Dec 2003 17:05:56 -0000 Received: (qmail 94310 invoked by uid 500); 19 Dec 2003 17:05:45 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 94142 invoked by uid 500); 19 Dec 2003 17:05:44 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 94125 invoked from network); 19 Dec 2003 17:05:43 -0000 Received: from unknown (HELO smtp105.mail.sc5.yahoo.com) (66.163.169.225) by daedalus.apache.org with SMTP; 19 Dec 2003 17:05:43 -0000 Received: from unknown (HELO MMcompaq01) (jochen?frey@66.120.216.177 with login) by smtp105.mail.sc5.yahoo.com with SMTP; 19 Dec 2003 17:05:47 -0000 Reply-To: From: "Jochen Frey" To: Subject: FW: Indexing Speed: Documents vs. Sentences Date: Fri, 19 Dec 2003 09:05:51 -0800 Message-ID: <000c01c3c652$5b78fee0$8012060a@MMcompaq01> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Stephane, The actual indexing is actually less glamorous than it sounds. When you index 1TB across 10 machines you end up with 100GB on each machine. = We do not merge the indexes either, since we get better speed on indexing = as well as querying when we keep indexes smaller and distributed across different machines. (But somehow I think that I'll sit down and merge = all of them together and play with it when I get a chance ... 'cause it's cool = :-) I'll keep you posted when it happens). =20 My test set that I am playing with is 40GB, and I just posted a benchmark. =20 Best, Jochen > -----Original Message----- > From: Stephane Vaucher [mailto:vauchers@cirano.qc.ca] > Sent: Thursday, December 18, 2003 9:01 AM > To: Lucene Users List; jochen_frey@yahoo.com > Subject: RE: Indexing Speed: Documents vs. Sentences >=20 > Jochen, >=20 > If you have a bit of time, could you post some metrics, (as an = example, > you can look at = http://jakarta.apache.org/lucene/docs/benchmarks.html). I > haven't heard of anyone indexing 1TB yet. I'm sure everyone is = interested > in problems you could be facing and we could probably give you some = ideas. > I know (oddly enough) I sometimes wish I had dataset greater than a = few M > docs to experiment with. >=20 > cheers, > sv >=20 > On Thu, 18 Dec 2003, Jochen Frey wrote: >=20 > > Hi, > > > > Yes, this is correct, I am dealing with a few 100GB (close to 1TB). > > I am, however, distributing the data across several machines and = then > merge > > the results from all the machines together (until I find a better & > faster > > solution). > > > > Cheers! > > > > > -----Original Message----- > > > From: Victor Hadianto [mailto:vichad@hadianto.net] > > > Sent: Wednesday, December 17, 2003 10:50 PM > > > To: Lucene Users List > > > Subject: Re: Indexing Speed: Documents vs. Sentences > > > > > > > Hi, > > > > > > > > I am using Lucene to index a large number of web pages (a few = 100GB) > and > > > the > > > > indexing speed is great. > > > > > > Jochen .. a few 100 GB? Is this correct? > > > > > > /victor > > > > > > > > > = --------------------------------------------------------------------- > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: = lucene-user-help@jakarta.apache.org > > > > > > = --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org