Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 66692 invoked from network); 7 Dec 2001 01:53:11 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 7 Dec 2001 01:53:11 -0000 Received: (qmail 4879 invoked by uid 97); 7 Dec 2001 01:53:19 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 4840 invoked by uid 97); 7 Dec 2001 01:53:18 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 4829 invoked from network); 7 Dec 2001 01:53:17 -0000 Message-ID: <49FE92BA8203D511B3000001027039C2E3CBEF@5800niilxch02.halo.com> From: Kristian Rickert To: "'lucene-dev@jakarta.apache.org'" Subject: RAMDirectory bug? Date: Thu, 6 Dec 2001 19:59:19 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I'm getting the classic bug "ArrayIndexOutOfBounds" when performing a search in my RAMDirectory. So I'm going to explain this one in full detail.. so forgive me if this is a waste to you. I really do think this is a re-surfaced bug because when my RAMDirectory has fewer than ~2000 documents, the searching works flawlessly. So here goes: I used release candidate 2 and the nightly build and both of these yield the same result... I am having the same problem with the RAMDirectory search as mentioned in message # ... here is a self-contained example that will cause the same error.. I'll comment along the way so I'm not one of those asses who says, "here's my code.. fix it!" * First of all, I am using this application to index objects that just contain simple values. I already use a RB tree to retrieve the information, which is plenty quick for what I need. * So, I am storing these as unstored and only indexed. * Furthermore, I do not need to index the primary key, only store it.. so my "VendorDocument" file goes as follows: public static Document Document(FourgenVendorInfo fvi) { // make a new, empty document Document doc = new Document(); doc.add(Field.UnStored("Address1", "" + fvi.getAddress1())); doc.add(Field.UnStored("Address2", "" + fvi.getAddress1())); doc.add(Field.UnStored("BusinessName", "" + fvi.getBus_name())); doc.add(Field.UnStored("City", "" + fvi.getCity())); doc.add(Field.UnStored("State", "" + fvi.getState())); doc.add(Field.UnStored("CountryCode", "" + fvi.getCountry_code())); doc.add(Field.UnStored("Fax", "" + fvi.getFax_phone())); doc.add(Field.UnStored("Asi", "" + fvi.getHal_asi_no())); doc.add(Field.UnStored("Phone", "" + fvi.getPhone())); doc.add(Field.UnIndexed("VendorCode", "" + fvi.getVend_code())); doc.add(Field.UnStored("Zip", "" + fvi.getZip())); doc.add(Field.UnStored("PlusFour", "" + AddressParsers.parsePlusFour(fvi.getZip()))); return doc; } * I have about 8000 of these documents to add to the index. First, I create RAMStorage: RAMDirectory RAMStorage = new RAMDirectory(); //RAMStorage.createFile("Vendors"); IndexWriter indexer = null; try { create the IndexWriter with the RAMStorage, using my vendor analyzer - which is a simplified form of simpleanalyzer (it doesn't ignore digits, literally 3 letters different code than the SimpleAnalyzer) indexer = new IndexWriter(RAMStorage, new VendorAnalyzer(), true); I add the docuemnts to the indexer, optimize it and close it. if (fviAllVendors != null) { for (int i = 0; i < fviAllVendors.length; i++) { Document currentDoc = VendorDocument.Document(fviAllVendors[i]); indexer.addDocument(currentDoc); //System.out.println(currentDoc.toString()); } } indexer.optimize(); indexer.close(); Finally, I perform the search with the line "+State:mn": Query query = QueryParser.parse(line, "contents", analyzer); System.out.println("Searching for: " + query.toString("contents")); Hits hits = searcher.search(query); It is at this point where I get the array index out of bounds exception. Other facts to especially note: * This error only happens when there is a successful hit in the search (this makes sense if you view the stack trace) * I have noticed that when I have an index size of ~2000, I never get the thing to break. Thus, I might just break this up into multple RAM directories as a hack fix, although I suspect it could be the data I'm providing * Wildcard querys work fine with the parser. * According to the stack trace, the error happens from a readInternal() command within the RAMInputStream Oh yeah, my environment: *Same error on NT 4.0 and Sun OS 7. *2 GB memory with a 100MB heap - nothing really taking up memory space I worked on a search engine on my own and will be willing to contribute if I find out the problem. For now, I may just switch to a file system search instead. But this will probably be slower than if I optimized the hell out of oracle and had that database do the trick for me. I hope this will help. Below is a list of the MAIN file I've been using to test. Also, you'll see a copy of the stack trace. import java.io.IOException; import java.io.BufferedReader; import java.io.InputStreamReader; import org.apache.log4j.Category; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.index.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.analysis.StopAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.search.Searcher; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.WildcardQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.Hits; import org.apache.lucene.queryParser.QueryParser; /** * * @author FOOBAR * @version */ public class VendorSearchIndexTest { public static VendorInfo [] fviAllVendors; /** * @param args the command line arguments */ public static void main (String args[]) { VendorInfo [] fviAllVendors = DB_FourgenVendor.retrieveFourgenVendors(); //create a RAM directory RAMDirectory RAMStorage = new RAMDirectory(); //I ran this with and without the line below RAMStorage.createFile("Vendors"); IndexWriter indexer = null; try { indexer = new IndexWriter(RAMStorage, new VendorAnalyzer(), true); if (fviAllVendors != null) { for (int i = 0; i < fviAllVendors.length; i++) { Document currentDoc = VendorDocument.Document(fviAllVendors[i]); indexer.addDocument(currentDoc); //System.out.println(currentDoc.toString()); } } indexer.optimize(); indexer.close(); Searcher searcher = new IndexSearcher(RAMStorage); Analyzer analyzer = new VendorAnalyzer(); BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); while (true) { System.out.print("Query: "); String line = in.readLine(); if (line.length() == -1) break; //WildcardQuery query = new WildcardQuery(new Term("+City", "LI*")); Query query = QueryParser.parse(line, "contents", analyzer); System.out.println("Searching for: " + query.toString("contents")); Hits hits = searcher.search(query); System.out.println(hits.length() + " total matching documents"); final int HITS_PER_PAGE = 10; for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) { int end = Math.min(hits.length(), start + HITS_PER_PAGE); for (int i = start; i < end; i++) System.out.println(i + ". " + hits.doc(i).get("VendorCode")); if (hits.length() > end) { System.out.print("more (y/n) ? "); line = in.readLine(); if (line.length() == 0 || line.charAt(0) == 'n') break; } } } searcher.close(); } catch (Exception e) { System.out.println("Exception.. what the?: " + e.toString()); } }} -- To unsubscribe, e-mail: For additional commands, e-mail: