lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Wierd Search Behavior
Date Thu, 01 Apr 2004 17:01:53 GMT
Terry,

Can you please try to develop a reproducible test case?  Otherwise it's 
impossible to verify and debug this.

For something like this it would suffice to provide:

   1. The initial index, which satisifies the test queries;

   2. The new index you add;

   3. Your merge and test code, as a single class that illustrates the 
problem.

The smaller the indexes the better: not only will it be easier to 
transfer them, but debugging will be faster.

Also, you should add a bug to track this, at:

   http://issues.apache.org/bugzilla/enter_bug.cgi?product=Lucene

Doug

Terry Steichen wrote:
> I did some more checking and uncovered what appears to be a serious Lucene
> problem. (Either that or my merge code - below - is wrong.)  Appreciate any
> help in figuring out what's wrong.  Here are the facts as I see them:
> 
> 1) I put together a large number of canned queries (some rather complex) for
> routine testing purposes.
> 2) I created a new compound file index and tested the queries.  All worked
> fine.
> 3) I then indexed some new documents and merged the new index with the
> original index.
> 4) I then tried the queries again.  Each time I did this, about 1-3% of the
> queries no longer worked - the actual number appears to vary with each
> merge.
> 5) The specific queries that fail change with each merge. Ones that failed
> after the previous merge almost always appear to work again with the next
> merge (which produces a new batch of failures).
> 6) In all cases I've so far examined, the offending part of the affected
> queries is a single quoted phrase (even though there may be several such
> phrases in the query) - remove it, and the (now modified) query works fine.
> 7) I tried the same thing using the original multi-file index format, with
> the same results.
> 8) About a week and a half ago, I migrated from 1.3final to the latest CVS
> head.
> 9) I've only just started checking this, so I don't know how long this
> behavior has been going on.  The small percentage of errors and (apparent)
> randomness of which query is affected make it hard to detect.
> 10) I have about 32 fields per document, most of which are tokenized,
> indexed and stored.
> 11) My merge code (for the multi-file index format) is this:
> 
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.store.FSDirectory;
> 
> class MergeIndices {
>   public static void main(String[] args) {
> 
>  //args[0]: relative path to main index
>  //args[1]: relative path to new index (to be merged with main)
> 
>  try {
>   IndexWriter writer = new IndexWriter(args[0], new StandardAnalyzer(),
> false);
>  // writer.setUseCompoundFile(true); //used for compound format
>   FSDirectory dir = FSDirectory.getDirectory(args[1], false);
>   FSDirectory[] dirs = new FSDirectory[1];
>   dirs[0] = dir;
>   writer.addIndexes(dirs);
>   writer.optimize();
>   writer.close();
>  } catch (Exception e) {
>   System.out.println(" caught a " + e.getClass() +
>     "\n with message: " + e.getMessage());
>  }
>   }
> 
> }
> 
> 
> 
> ----- Original Message -----
> From: "Terry Steichen" <terry@net-frame.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, March 31, 2004 11:47 AM
> Subject: Re: Wierd Search Behavior
> 
> 
> 
>>No, they're typos in the e-mail.  In the application, all the colons are
>>properly placed.  (Guess I was/am so frustrated I can't write right any
>>more).
>>
>>Terry
>>
>>----- Original Message -----
>>From: "Erik Hatcher" <erik@ehatchersolutions.com>
>>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>Sent: Wednesday, March 31, 2004 9:55 AM
>>Subject: Re: Wierd Search Behavior
>>
>>
>>
>>>On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
>>>
>>>>I'm experiencing some very puzzling search behavior.  I am using the
>>>>CVS head I pulled about a week ago.  I use the StandardAnalyzer and
>>>>QueryParser.  I have a collection of XML documents indexed.  One field
>>>>is "subhead", and here's what I find with different queries:
>>>>subhead:(missile defense)    - works fine
>>>>subhead("missile" "defense") - works fine
>>>>subhead("missile defense") - fails
>>>>subhead(missile defense "missile defense") - fails
>>>>subhead(missile defense "missile dork") - works fine
>>>>subhead(missile defense "missile defens") - works fine (note
>>>>misspelling)
>>>
>>>I presume the missing colons on all but the first example is just a
>>>typo in your e-mail?  If not, might that be the problem?
>>>
>>>Erik
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message