Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 46636 invoked from network); 24 Jun 2003 10:15:16 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 24 Jun 2003 10:15:16 -0000 Received: (qmail 1375 invoked by uid 97); 24 Jun 2003 10:17:49 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 1368 invoked from network); 24 Jun 2003 10:17:49 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 24 Jun 2003 10:17:49 -0000 Received: (qmail 35055 invoked by uid 500); 24 Jun 2003 10:12:24 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 8922 invoked from network); 24 Jun 2003 09:56:22 -0000 Received: from unknown (HELO highway61.com) (64.191.37.65) by daedalus.apache.org with SMTP; 24 Jun 2003 09:56:22 -0000 Received: (qmail 1834 invoked from network); 24 Jun 2003 10:33:05 -0000 Received: from unknown (HELO highway1) (211.53.202.138) by highway61.com with SMTP; 24 Jun 2003 10:33:05 -0000 From: "John Takacs" To: "Lucene Users List" Subject: RE: commercial websites powered by Lucene? Date: Tue, 24 Jun 2003 18:52:12 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal In-Reply-To: <000b01c33a2a$d2675290$1801a8c0@naderit> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi Nader, This thread is by far one of the best, and most practical. It will only be topped when someone provides benchmarks for a DMOZ.org type directory of 3 million plus urls. I would love to, but the whole JavaCC thing is a show stopper. Questions: I noticed that search is a little slow. What has been your experience? Perhaps it was a bandwidth issue, but I'm living in a country with the greatest internet connectivity and penetration in the world (South Korea), so I don't think that is an issue on my end. You have 500,000 resumes. Based on the steps you took to get to 500,000, do you think your current setup will scale to millions, like say, 3 million or so? What is your hardware like? CPU/RAM? Warm regards, and thanks for sharing. If I can ever get passed the Lucene/JavaCC installation failure, I'll share my benchmarks on the above directory scenario. John -----Original Message----- From: Nader S. Henein [mailto:nsh@bayt.net] Sent: Tuesday, June 24, 2003 5:30 PM To: 'Lucene Users List' Subject: RE: commercial websites powered by Lucene? I handle updates or inserts the same way first I delete the document from the index and then I insert it (better safe than sorry), I batch my updates/inserts every twenty minutes, I would do it in smaller intervals but since I have to sync the XML files created from the DB to three machines (I maintain three separate Lucene indices on my three separate web-servers) it takes a little longer. You have to batch your changes because Updating the index takes time as opposed to deleted which I batch every two minutes. You won't have a problem updating the index and searching at the same time because lucene updates the index on a separate set of files and then when It's done it overwrites the old version. I've had to provide for Backups, and things like server crashes mid-indexing, but I was using Oracle Intermedia before and Lucene BLOWS IT AWAY. -----Original Message----- From: news [mailto:news@main.gmane.org] On Behalf Of Chris Miller Sent: Tuesday, June 24, 2003 12:06 PM To: lucene-user@jakarta.apache.org Subject: Re: commercial websites powered by Lucene? Hi Nader, I was wondering if you'd mind me asking you a couple of questions about your implementation? The main thing I'm interested in is how you handle updates to Lucene's index. I'd imagine you have a fairly high turnover of CVs and jobs, so index updates must place a reasonable load on the CPU/disk. Do you keep CVs and jobs in the same index or two different ones? And what is the process you use to update the index(es) - do you batch-process updates or do you handle them in real-time as changes are made? Any insight you can offer would be much appreciated as I'm about to implement something similar and am a little unsure of the best approach to take. We need to be able to handle indexing about 60,000 documents/day, while allowing (many) searches to continue operating alongside. Thanks! Chris "Nader S. Henein" wrote in message news:001401c32b38$32aa2440$d501a8c0@naderit... > We use Lucene http://www.bayt.com , we're basically an on-line > Recruitment site and up until now we've got around 500 000 CVs and > documents indexed with results that stump Oracle Intermedia. > > Nader Henein > Senior Web Dev > > Bayt.com > > -----Original Message----- > From: John_Chun@platts.com [mailto:John_Chun@platts.com] > Sent: Wednesday, June 04, 2003 6:09 PM > To: lucene-user@jakarta.apache.org > Subject: commercial websites powered by Lucene? > > > > Hello All, > > I've been trying to find examples of large commercial websites that > use Lucene to power their search. Having such examples would make > Lucene an easy sell to management > > Does anyone know of any good examples? The bigger the better, and the > more the better. > > TIA, > -John > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org