lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <...@bayt.net>
Subject RE: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 10:32:27 GMT
The search is a little sluggish because our initial architecture was
based on TCL, not java, so until we complete the full java overhaul,
every time I perform a search the AOL Webserver (tcl) has to call the
servlet in Resin (where lucene is)  and then perform the search, then
this is the killer , I have to parse all the results from a Java
Collection into a TCL List, the most intense search with thousands of
results takes less than a second, it's all the things I have to do
around it that take time.

Nader

-----Original Message-----
From: John Takacs [mailto:takacsj@highway61.com] 
Sent: Tuesday, June 24, 2003 1:52 PM
To: Lucene Users List
Subject: RE: commercial websites powered by Lucene?


Hi Nader,

This thread is by far one of the best, and most practical.  It will only
be topped when someone provides benchmarks for a DMOZ.org type directory
of 3 million plus urls.  I would love to, but the whole JavaCC thing is
a show stopper.

Questions:

I noticed that search is a little slow.  What has been your experience?
Perhaps it was a bandwidth issue, but I'm living in a country with the
greatest internet connectivity and penetration in the world (South
Korea), so I don't think that is an issue on my end.

You have 500,000 resumes.  Based on the steps you took to get to
500,000, do you think your current setup will scale to millions, like
say, 3 million or so?

What is your hardware like?  CPU/RAM?

Warm regards, and thanks for sharing.  If I can ever get passed the
Lucene/JavaCC installation failure, I'll share my benchmarks on the
above directory scenario.

John



-----Original Message-----
From: Nader S. Henein [mailto:nsh@bayt.net]
Sent: Tuesday, June 24, 2003 5:30 PM
To: 'Lucene Users List'
Subject: RE: commercial websites powered by Lucene?


 I handle updates or inserts the same way first I delete the document
from the index and then I insert it (better safe than sorry), I batch my
updates/inserts every twenty minutes, I would do it in smaller intervals
but since I have to sync the XML files created from the DB to three
machines (I maintain three separate Lucene indices on my three separate
web-servers) it takes a little longer. You have to batch your changes
because Updating the index takes time as opposed to deleted which I
batch every two minutes. You won't have a problem updating the index and
searching at the same time because lucene updates the index on a
separate set of files and then when It's done it overwrites the old
version. I've had to provide for Backups, and things like server crashes
mid-indexing, but I was using Oracle Intermedia before and Lucene BLOWS
IT AWAY.

-----Original Message-----
From: news [mailto:news@main.gmane.org] On Behalf Of Chris Miller
Sent: Tuesday, June 24, 2003 12:06 PM
To: lucene-user@jakarta.apache.org
Subject: Re: commercial websites powered by Lucene?


Hi Nader,

I was wondering if you'd mind me asking you a couple of questions about
your implementation?

The main thing I'm interested in is how you handle updates to Lucene's
index. I'd imagine you have a fairly high turnover of CVs and jobs, so
index updates must place a reasonable load on the CPU/disk. Do you keep
CVs and jobs in the same index or two different ones? And what is the
process you use to update the index(es) - do you batch-process updates
or do you handle them in real-time as changes are made?

Any insight you can offer would be much appreciated as I'm about to
implement something similar and am a little unsure of the best approach
to take. We need to be able to handle indexing about 60,000
documents/day, while allowing (many) searches to continue operating
alongside.

Thanks!
Chris

"Nader S. Henein" <nsh@bayt.net> wrote in message
news:001401c32b38$32aa2440$d501a8c0@naderit...
> We use Lucene http://www.bayt.com , we're basically an on-line 
> Recruitment site and up until now we've got around 500 000 CVs and 
> documents indexed with results that stump Oracle Intermedia.
>
> Nader Henein
> Senior Web Dev
>
> Bayt.com
>
> -----Original Message-----
> From: John_Chun@platts.com [mailto:John_Chun@platts.com]
> Sent: Wednesday, June 04, 2003 6:09 PM
> To: lucene-user@jakarta.apache.org
> Subject: commercial websites powered by Lucene?
>
>
>
> Hello All,
>
> I've been trying to find examples of large commercial websites that 
> use Lucene to power their search.  Having such examples would make 
> Lucene an easy sell to management
>
> Does anyone know of any good examples?  The bigger the better, and the

> more the better.
>
> TIA,
> -John
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message