Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Message-ID: <49FE92BA8203D511B3000001027039C2E3CBEF@5800niilxch02.halo.com>
From: Kristian Rickert <Kristian.Rickert@halo.com>
To: "'lucene-dev@jakarta.apache.org'" <lucene-dev@jakarta.apache.org>
Subject: RAMDirectory bug?
Date: Thu, 6 Dec 2001 19:59:19 -0600 
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"

I'm getting the classic bug "ArrayIndexOutOfBounds" when performing a search
in my RAMDirectory.

So I'm going to explain this one in full detail.. so forgive me if this is a
waste to you.  I really do think this is a re-surfaced bug because when my
RAMDirectory has fewer than ~2000 documents, the searching works flawlessly.

So here goes:


I used release candidate 2 and the nightly build and both of these yield the
same result...

I am having the same problem with the RAMDirectory search as mentioned in
message # ... here is a self-contained example that will cause the same
error.. I'll comment along the way so I'm not one of those asses who says,
"here's my code.. fix it!"

* First of all, I am using this application to index objects that just
contain simple values.  I already use a RB tree to retrieve the information,
which is plenty quick for what I need.  
* So, I am storing these as unstored and only indexed.  
* Furthermore, I do not need to index the primary key, only store it.. so my
"VendorDocument" file goes as follows:
  public static Document Document(FourgenVendorInfo fvi) {
    // make a new, empty document
    Document doc = new Document();

    doc.add(Field.UnStored("Address1", "" + fvi.getAddress1()));
    doc.add(Field.UnStored("Address2", "" + fvi.getAddress1()));
    doc.add(Field.UnStored("BusinessName", "" + fvi.getBus_name()));
    doc.add(Field.UnStored("City", "" + fvi.getCity()));
    doc.add(Field.UnStored("State", "" + fvi.getState()));
    doc.add(Field.UnStored("CountryCode", "" +  fvi.getCountry_code()));    
    doc.add(Field.UnStored("Fax", "" + fvi.getFax_phone()));
    doc.add(Field.UnStored("Asi", "" + fvi.getHal_asi_no()));
    doc.add(Field.UnStored("Phone", "" + fvi.getPhone()));
    doc.add(Field.UnIndexed("VendorCode", "" + fvi.getVend_code()));
    doc.add(Field.UnStored("Zip", "" + fvi.getZip()));
    doc.add(Field.UnStored("PlusFour", "" +
AddressParsers.parsePlusFour(fvi.getZip())));
    return doc;
  }
* I have about 8000 of these documents to add to the index.  


First, I create RAMStorage:
        RAMDirectory RAMStorage = new RAMDirectory();
        //RAMStorage.createFile("Vendors");
        IndexWriter indexer = null;
        try {

create the IndexWriter with the RAMStorage, using my vendor analyzer - which
is a simplified form of simpleanalyzer (it doesn't ignore digits, literally
3 letters different code than the SimpleAnalyzer)
            indexer = new IndexWriter(RAMStorage, new VendorAnalyzer(),
true);
            

I add the docuemnts to the indexer, optimize it and close it.
            if (fviAllVendors != null) {
                for (int i = 0; i < fviAllVendors.length; i++) {
                    Document currentDoc =
VendorDocument.Document(fviAllVendors[i]);
                    indexer.addDocument(currentDoc);
                    //System.out.println(currentDoc.toString());
                }
            }
            indexer.optimize();
            indexer.close();

Finally, I perform the search with the line "+State:mn":
	Query query = QueryParser.parse(line, "contents", analyzer);
	System.out.println("Searching for: " + query.toString("contents"));
	Hits hits = searcher.search(query);


It is at this point where I get the array index out of bounds exception.

Other facts to especially note:
* This error only happens when there is a successful hit in the search (this
makes sense if you view the stack trace)
* I have noticed that when I have an index size of ~2000, I never get the
thing to break.  Thus, I might just break this up into multple RAM
directories as a hack fix, although I suspect it could be the data I'm
providing
* Wildcard querys work fine with the parser.
* According to the stack trace, the error happens from a readInternal()
command within the RAMInputStream

Oh yeah, my environment:
*Same error on NT 4.0 and Sun OS 7.
*2 GB memory with a 100MB heap - nothing really taking up memory space


I worked on a search engine on my own and will be willing to contribute if I
find out the problem.  For now, I may just switch to a file system search
instead.  But this will probably be slower than if I optimized the hell out
of oracle and had that database do the trick for me.

I hope this will help.  Below is a list of the MAIN file I've been using to
test.  Also, you'll see a copy of the stack trace.


import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.log4j.Category;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.*;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

/**
 *
 * @author  FOOBAR
 * @version 
 */
public class VendorSearchIndexTest {

    
    public static VendorInfo [] fviAllVendors;

    /**
    * @param args the command line arguments
    */
    public static void main (String args[]) {
        VendorInfo [] fviAllVendors =
DB_FourgenVendor.retrieveFourgenVendors();

        //create a RAM directory
        RAMDirectory RAMStorage = new RAMDirectory();
        //I ran this with and without the line below
        RAMStorage.createFile("Vendors");

        IndexWriter indexer = null;
        try {
            indexer = new IndexWriter(RAMStorage, new VendorAnalyzer(),
true);
            

            if (fviAllVendors != null) {
                for (int i = 0; i < fviAllVendors.length; i++) {
                    Document currentDoc =
VendorDocument.Document(fviAllVendors[i]);
                    indexer.addDocument(currentDoc);
                    //System.out.println(currentDoc.toString());
                }
            }
            indexer.optimize();
            indexer.close();
            Searcher searcher = new IndexSearcher(RAMStorage);
            Analyzer analyzer = new VendorAnalyzer();
      

      BufferedReader in = new BufferedReader(new
InputStreamReader(System.in));
      while (true) {
	System.out.print("Query: ");
	String line = in.readLine();

	if (line.length() == -1)
	  break;
        //WildcardQuery query = new WildcardQuery(new Term("+City", "LI*"));
	Query query = QueryParser.parse(line, "contents", analyzer);
	System.out.println("Searching for: " + query.toString("contents"));

	Hits hits = searcher.search(query);
	System.out.println(hits.length() + " total matching documents");

	final int HITS_PER_PAGE = 10;
	for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
	  int end = Math.min(hits.length(), start + HITS_PER_PAGE);
	  for (int i = start; i < end; i++)
	    System.out.println(i + ". " + hits.doc(i).get("VendorCode"));
	  if (hits.length() > end) {
	    System.out.print("more (y/n) ? ");
	    line = in.readLine();
	    if (line.length() == 0 || line.charAt(0) == 'n')
	      break;
	  }
	}
      }
      searcher.close();
  
            
        } catch (Exception e) {
            System.out.println("Exception.. what the?: " + e.toString());
        }
  }}

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>