lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Wierd Search Behavior
Date Thu, 01 Apr 2004 12:51:52 GMT
I did some more checking and uncovered what appears to be a serious Lucene
problem. (Either that or my merge code - below - is wrong.)  Appreciate any
help in figuring out what's wrong.  Here are the facts as I see them:

1) I put together a large number of canned queries (some rather complex) for
routine testing purposes.
2) I created a new compound file index and tested the queries.  All worked
fine.
3) I then indexed some new documents and merged the new index with the
original index.
4) I then tried the queries again.  Each time I did this, about 1-3% of the
queries no longer worked - the actual number appears to vary with each
merge.
5) The specific queries that fail change with each merge. Ones that failed
after the previous merge almost always appear to work again with the next
merge (which produces a new batch of failures).
6) In all cases I've so far examined, the offending part of the affected
queries is a single quoted phrase (even though there may be several such
phrases in the query) - remove it, and the (now modified) query works fine.
7) I tried the same thing using the original multi-file index format, with
the same results.
8) About a week and a half ago, I migrated from 1.3final to the latest CVS
head.
9) I've only just started checking this, so I don't know how long this
behavior has been going on.  The small percentage of errors and (apparent)
randomness of which query is affected make it hard to detect.
10) I have about 32 fields per document, most of which are tokenized,
indexed and stored.
11) My merge code (for the multi-file index format) is this:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.FSDirectory;

class MergeIndices {
  public static void main(String[] args) {

 //args[0]: relative path to main index
 //args[1]: relative path to new index (to be merged with main)

 try {
  IndexWriter writer = new IndexWriter(args[0], new StandardAnalyzer(),
false);
 // writer.setUseCompoundFile(true); //used for compound format
  FSDirectory dir = FSDirectory.getDirectory(args[1], false);
  FSDirectory[] dirs = new FSDirectory[1];
  dirs[0] = dir;
  writer.addIndexes(dirs);
  writer.optimize();
  writer.close();
 } catch (Exception e) {
  System.out.println(" caught a " + e.getClass() +
    "\n with message: " + e.getMessage());
 }
  }

}



----- Original Message -----
From: "Terry Steichen" <terry@net-frame.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, March 31, 2004 11:47 AM
Subject: Re: Wierd Search Behavior


> No, they're typos in the e-mail.  In the application, all the colons are
> properly placed.  (Guess I was/am so frustrated I can't write right any
> more).
>
> Terry
>
> ----- Original Message -----
> From: "Erik Hatcher" <erik@ehatchersolutions.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, March 31, 2004 9:55 AM
> Subject: Re: Wierd Search Behavior
>
>
> > On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
> > > I'm experiencing some very puzzling search behavior.  I am using the
> > > CVS head I pulled about a week ago.  I use the StandardAnalyzer and
> > > QueryParser.  I have a collection of XML documents indexed.  One field
> > > is "subhead", and here's what I find with different queries:
> > > subhead:(missile defense)    - works fine
> > > subhead("missile" "defense") - works fine
> > > subhead("missile defense") - fails
> > > subhead(missile defense "missile defense") - fails
> > > subhead(missile defense "missile dork") - works fine
> > > subhead(missile defense "missile defens") - works fine (note
> > > misspelling)
> >
> > I presume the missing colons on all but the first example is just a
> > typo in your e-mail?  If not, might that be the problem?
> >
> > Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message