lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: a proof that every word is indexing properly
Date Thu, 02 Dec 2010 08:23:53 GMT
On Thu, 2010-12-02 at 03:54 +0100, David Linde wrote:
> Has anyone figured out a way to logically prove that lucene indexes ever
> word properly?

The "Precision and recall in lucene"-thread seems relevant here.

> Our company has done alot of research into lucene, all of our IT department
> is really impressed and excited about lucene *except* one of the older
> search/indexing experts.
> Who doesn't want to move to a new search engine, is there anyway to
> logically prove, that lucene indexes every word properly?

That is a straw man argument. As the precision-thread shows, it is
extremely hard to define what "properly" means in relation to a
non-trivial retrieval system like Lucene.

If your grouch is an old-school database man, he might equal "properly"
to "Every word that exists in the source should be indexed so that a
search for that word will return all documents that contains it and no
other documents (phew)". As David implies, this is a bad test: It
satisfies the guy but does not a proper search system make.

But I'm just guessing here and it sounds like you're doing the same,
asking for ideas of proving Lucene functionality. Maybe you could turn
it around? Ask the guy what it would take for him to accept Lucene or
any other option for that matter. When you have that, you can discuss
whether his requirements are valid or not.

> One idea we considered is attempting to rebuild the source from the index,
> but it seems like doing that would take a huge effort.

It is also not possible in general. Writing specific code, you could
just cheat massively and store everything, giving you instant and 100%
correct rebuild.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message