lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: Making a case for Lucene
Date Wed, 30 Jun 2004 17:29:41 GMT
Alex McManus wrote:

> Hi,
> we are at the initial design stages of a public-facing web-based search
> application for a U.S. Federal Agency. We have proposed a clustered Lucene
> architecture as the best technical solution, as we feel their current system
> (based on Oracle) won't give the best performance, and introduces a lot of
> unnecessary complexity and expense (as the system is read-only). We also
> feel that the Lucene design will be very flexible, easier to maintain and
> administor.
> Government agencies are notoriously conservative when it comes to decisions
> about technology, especially when open-source is involved. Perhaps
> surprisingly, their response has been encouraging. However, they want
> further re-assurance that other big-name organizations have successfully
> used Lucene for large datasets.
> First some background: we will be searching a number of repositories, the
> largest of which includes about 600,000 documents, and might reach 10
> million over the next 10 years. The documents are probably comparable to web
> pages in terms of average size, and would be indexed under about 10
> different fields. Our plan is to partition the indexes and distribute then
> over a number of modern Intel/Linux servers.
> From what I pick up on the mailing lists, this seems well within the
> capabilities of Lucene. I've looked at the Powered-by Lucene pages, but
> there are two problems: (i) there are no details on the size of the datasets
> being searched; (ii) I don't think our customer would recognize any of these
> organizations.
> In Otis' OnJava article, he list "FedEx, Overture, Mayo Clinic, Hewlett
> Packard, New Scientist magazine, Epiphany, and others using, or at least
> evaluating, Lucene". This is more like it(!), but I want to be honest and
> open with our customer, and the "or at least evaluating" comment is not
> concrete enough, and there is no idea of scale.
> The best example that I've been able to find is the Yahoo research lab - as
> I understand it, this is a Nutch (i.e. Lucene) implementation that's
> providing impressive performance over a 100 million document repository.
> I would be very grateful if anyone could pass on some basic details of
> successful large-scale Lucene projects, and even more so if they involve a
> "big name" or government agency. If you are happy to pass this information
> on, but would prefer to keep it off the public mailing list, then please
> email me directly - I will respect confidentiality.
> I think that this problem of re-assuring customers/managers is a common one,
> so I would be happy to collate any responses to this as a new Wiki entry.
> Hopefully one day (with their permission) we will be able to add our
> customer to the Powered-by Lucene page too.

Are you interested in the opposing view? Lucene is essentially a 
"library", not an application. You have to create a whole app around it 
including admin interfaces, training classes, etc. Something like Verity 
has the whole infrastructure - front end, admin i/f, training classes, 
consultants, machine sizing guides. The cost for a commercial solution 
seems easier to scope- how much does Verity (or whoever) charge? With 
some Lucene/OSS solution there's development that would need to be done 
and all the stuff that makes a complete, comprehensive app...

Of course if cost is a big issue then trying some Lucene++ solution 
makes sense. And don't get me wrong...I'm a big fan of Lucene, and when 
I was CTO at Lumos Technologies, we were cost sensitive, and I took care 
of making sure we used Lucene on the intranet and our public sites and 
were very happy w/ it. I'm just trying to make sure the, um, the, or 
one, "business viewpoint" is stated....commercial solutions already 
exist and are complete and they work and there's no development to do, 
so there's less risk... Where I'm at now I brought it up to the VP/MIS 
however it just didn't make sense for the way that dept was run, thus 
we're starting to use Verity.

- Dave

> Thanks in advance,
> Alex McManus (
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message