Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 92029 invoked from network); 4 Jan 2009 18:30:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Jan 2009 18:30:14 -0000 Received: (qmail 66484 invoked by uid 500); 4 Jan 2009 18:30:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 66455 invoked by uid 500); 4 Jan 2009 18:30:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 66443 invoked by uid 99); 4 Jan 2009 18:30:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jan 2009 10:30:06 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aminmc@gmail.com designates 209.85.219.21 as permitted sender) Received: from [209.85.219.21] (HELO mail-ew0-f21.google.com) (209.85.219.21) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jan 2009 18:29:58 +0000 Received: by ewy14 with SMTP id 14so7841937ewy.5 for ; Sun, 04 Jan 2009 10:29:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:references:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:x-mailer :mime-version:subject:date:cc; bh=ZGEni6xxjff8f045Op1Gpps9Z555hdz3c44wAoEcWIU=; b=DxYU4CeIADjo5H7B5pK2EHGrcDqUlznn23TF47RaBatXoRFtZQHE/7fi9DvHzgJo+8 0D7MKDuhyfvjBts9yWsi2rJza1QzFHqBYVPn1fEIyu0znp8dmrksK4sNiUZYGCaCuM/r aZx46gfZXBi3F1LrKp6PY14XvbwrnMrsmv04Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=references:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:x-mailer:mime-version:subject:date:cc; b=RglyQGINCQ3lelI3Fh4ZMAEfcsqgrwK6ksvrRLh7XmGs1r9/4Mexh8YHNhfwJwsrhx /eB7BMSMliWhuPsRe1E6oZZ+qBAqyFtmzcdh5Mt+bcYLiWy+lSNBP8H2yWwhDRV4aS4c GvykOoDjqmOsWZlb+JhAPYlLnSQLhxoytvMzA= Received: by 10.210.39.8 with SMTP id m8mr125119ebm.163.1231093776150; Sun, 04 Jan 2009 10:29:36 -0800 (PST) Received: from ?192.168.1.75? (host81-151-183-31.range81-151.btcentralplus.com [81.151.183.31]) by mx.google.com with ESMTPS id g11sm14229189gve.4.2009.01.04.10.29.33 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 04 Jan 2009 10:29:35 -0800 (PST) References: <6C66B952-A4B6-4699-8D34-28771C114146@apache.org> <827AF6CA-73E3-4AB1-8DB3-F061505EA65D@gmail.com> Message-Id: <7188A543-6BBA-4B5E-AB38-48C130D11A98@gmail.com> From: Amin Mohammed-Coleman To: "java-user@lucene.apache.org" In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Mailer: iPhone Mail (5G77) Mime-Version: 1.0 (iPhone Mail 5G77) Subject: Re: Search Test file Date: Sun, 4 Jan 2009 18:29:32 +0000 Cc: "java-user@lucene.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org Hi Test case passing now. Thanks for your help. I kind of thought it was probably something I was doing wrong! Cheers Amin On 4 Jan 2009, at 16:59, Grant Ingersoll wrote: > > On Jan 4, 2009, at 2:49 AM, Amin Mohammed-Coleman wrote: > >> Hi Grant >> >> Thank you for looking at the test case. I have updated the >> IndexWriter to use UNLIMITED for MaxFieldLength. I tried using >> Integer.MAX_VALUE for >> >>>> Also, >>>> TopDocs topDocs = multiSearcher.search(query, >>>> BooleanQuery.getMaxClauseCount()); >>>> >>>> strikes me as really odd. Why are you passing in the max clause >>>> count as the number of results you want returned? >> >> > > Just pass in something like "10". > >> However I get the following exception : >> >> java.lang.NegativeArraySizeException >> at >> org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java: >> 41) >> at org.apache.lucene.search.HitQueue.(HitQueue.java:24) >> at >> org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:200) >> at org.apache.lucene.search.Searcher.search(Searcher.java:136) >> at org.apache.lucene.search.Searcher.search(Searcher.java:146) >> at >> com. >> amin. >> app. >> lucene. >> search.impl.SearchTest.testCanSearchRtfDocument(SearchTest.java:101) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun. >> reflect. >> NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun. >> reflect. >> DelegatingMethodAccessorImpl. >> invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.junit.internal.runners.TestMethod.invoke(TestMethod.java: >> 59) >> at >> org. >> junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java: >> 98) >> at org.junit.internal.runners.MethodRoadie >> $2.run(MethodRoadie.java:79) >> at >> org. >> junit. >> internal. >> runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java: >> 87) >> at >> org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) >> at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java: >> 42) >> at >> org. >> junit. >> internal. >> runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) >> at >> org. >> junit. >> internal. >> runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) >> at org.junit.internal.runners.JUnit4ClassRunner >> $1.run(JUnit4ClassRunner.java:44) >> at >> org. >> junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java: >> 27) >> at >> org. >> junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) >> at >> org. >> junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java: >> 42) >> at >> org. >> eclipse. >> jdt. >> internal. >> junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45) >> at >> org. >> eclipse. >> jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) >> at >> org. >> eclipse. >> jdt. >> internal. >> junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) >> at >> org. >> eclipse. >> jdt. >> internal. >> junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) >> at >> org. >> eclipse. >> jdt. >> internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) >> at >> org. >> eclipse. >> jdt. >> internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: >> 196) >> >> >> I know that this is an issue (not being able to use >> Integer.MAX_VALUE). I tried using 100 and my test still doesn't >> pass. >> >> >> Cheers >> Amin >> >> >> On 4 Jan 2009, at 02:23, Grant Ingersoll wrote: >> >>> >>> >>> Begin forwarded message: >>> >>>> From: Grant Ingersoll >>>> Date: January 3, 2009 8:19:14 PM EST >>>> To: java-dev@lucene.apache.org >>>> Subject: Fwd: Search Test file >>>> Reply-To: java-dev@lucene.apache.org >>>> >>>> Hi Amin, >>>> >>>> I see a couple of issues with your program below, and one that is >>>> the cause of the problem of not finding "amin" as a query term. >>>> >>>> When you construct your IndexWriter, you are doing: >>>>> IndexWriter indexWriter = new >>>>> IndexWriter(getDirectory(),getAnalyzer(),new >>>>> IndexWriter.MaxFieldLength(2)); >>>> >>>> The MaxFieldLength parameter specifies the maximum number of >>>> tokens allowed in a Field. Everything else after that is >>>> dropped. See http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20org.apache.lucene.index.IndexWriter.MaxFieldLength >>>> ) and http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.MaxFieldLength.html >>>> >>>> Also, >>>> TopDocs topDocs = multiSearcher.search(query, >>>> BooleanQuery.getMaxClauseCount()); >>>> >>>> strikes me as really odd. Why are you passing in the max clause >>>> count as the number of results you want returned? >>>> >>>> Cheers, >>>> Grant >>>> >>>> >>>> >>>> Begin forwarded message: >>>> >>>>> From: "aminmc@gmail.com" >>>>> Date: January 3, 2009 3:24:52 PM EST >>>>> To: gsingers@apache.org >>>>> Subject: Search Test file >>>>> >>>>> I've shared a document with you called "Search Test file": >>>>> http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx&invite=cjq79zj >>>>> >>>>> It's not an attachment -- it's stored online at Google Docs. To >>>>> open this document, just click the link above. >>>>> --- >>>>> >>>>> Hi >>>>> >>>>> I have uploaded the test file at google docs. It is currently a >>>>> txt file but if you change the extension to .java it should work. >>>>> >>>>> package com.amin.app.lucene.search.impl; >>>>> >>>>> import static org.junit.Assert.assertEquals; >>>>> import static org.junit.Assert.assertNotNull; >>>>> import static org.junit.Assert.assertNotSame; >>>>> import static org.junit.Assert.assertTrue; >>>>> >>>>> import java.io.File; >>>>> import java.io.FileInputStream; >>>>> import java.io.FileOutputStream; >>>>> import java.io.IOException; >>>>> import java.io.InputStream; >>>>> import java.io.OutputStream; >>>>> >>>>> import javax.swing.text.BadLocationException; >>>>> import javax.swing.text.DefaultStyledDocument; >>>>> import javax.swing.text.rtf.RTFEditorKit; >>>>> >>>>> import org.apache.commons.lang.StringUtils; >>>>> import org.apache.lucene.analysis.Analyzer; >>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>>>> import org.apache.lucene.ant.DocumentHandler; >>>>> import org.apache.lucene.ant.DocumentHandlerException; >>>>> import org.apache.lucene.document.Document; >>>>> import org.apache.lucene.document.Field; >>>>> import org.apache.lucene.index.CorruptIndexException; >>>>> import org.apache.lucene.index.IndexReader; >>>>> import org.apache.lucene.index.IndexWriter; >>>>> import org.apache.lucene.queryParser.MultiFieldQueryParser; >>>>> import org.apache.lucene.queryParser.QueryParser; >>>>> import org.apache.lucene.search.BooleanQuery; >>>>> import org.apache.lucene.search.IndexSearcher; >>>>> import org.apache.lucene.search.MultiSearcher; >>>>> import org.apache.lucene.search.Query; >>>>> import org.apache.lucene.search.ScoreDoc; >>>>> import org.apache.lucene.search.Searchable; >>>>> import org.apache.lucene.search.TopDocs; >>>>> import org.apache.lucene.store.Directory; >>>>> import org.apache.lucene.store.FSDirectory; >>>>> import org.junit.After; >>>>> import org.junit.Before; >>>>> import org.junit.Test; >>>>> >>>>> import com.amin.app.lucene.util.WorkItem.IndexerType; >>>>> >>>>> public class SearchTest { >>>>> >>>>> private File rtfFile = null; >>>>> private static final String RTF_FILE_NAME = >>>>> "rtfDocumentToIndex.rtf"; >>>>> >>>>> @Before >>>>> public void setUp() throws Exception { >>>>> InputStream inputStream = >>>>> this. >>>>> getClass().getClassLoader().getResourceAsStream(RTF_FILE_NAME); >>>>> rtfFile = new File(RTF_FILE_NAME); >>>>> convertInputStreamToFile(inputStream, rtfFile); >>>>> } >>>>> >>>>> >>>>> >>>>> @Test >>>>> public void testCanCreateLuceneDocumentForRTFDocument() throws >>>>> Exception { >>>>> JavaBuiltInRTFHandler builtInRTFHandler = new >>>>> JavaBuiltInRTFHandler(); >>>>> Document document = builtInRTFHandler.getDocument(rtfFile); >>>>> assertNotNull(document); >>>>> String value = document.get(FieldNameEnum.BODY.getDescription()); >>>>> assertNotNull(value); >>>>> assertNotSame("", value); >>>>> assertTrue(value.contains("Amin Mohammed-Coleman")); >>>>> assertTrue(value.contains("This is a test rtf document that will >>>>> be indexed.")); >>>>> String path = document.get(FieldNameEnum.PATH.getDescription()); >>>>> assertNotNull(path); >>>>> assertTrue(path.contains(".rtf")); >>>>> String fileName = >>>>> document.get(FieldNameEnum.NAME.getDescription()); >>>>> assertNotNull(fileName); >>>>> assertEquals(RTF_FILE_NAME, fileName); >>>>> assertEquals(WorkItem.IndexerType.RTF_INDEXER.name(), >>>>> document.get(FieldNameEnum.TYPE.getDescription())); >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> @Test >>>>> public void testCanSearchRtfDocument() throws Exception { >>>>> JavaBuiltInRTFHandler builtInRTFHandler = new >>>>> JavaBuiltInRTFHandler(); >>>>> Document document = builtInRTFHandler.getDocument(rtfFile); >>>>> IndexWriter indexWriter = new >>>>> IndexWriter(getDirectory(),getAnalyzer(),new >>>>> IndexWriter.MaxFieldLength(2)); >>>>> try { >>>>> indexWriter.addDocument(document); >>>>> commitAndCloseWriter(indexWriter); >>>>> } catch (CorruptIndexException e) { >>>>> throw new IllegalStateException(e); >>>>> } catch (IOException e) { >>>>> throw new IllegalStateException(e); >>>>> } >>>>> >>>>> //I plan to use other searchers later >>>>> IndexSearcher indexSearcher = new IndexSearcher(getDirectory()); >>>>> MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] >>>>> {indexSearcher}); >>>>> QueryParser queryParser = new MultiFieldQueryParser(new String[] >>>>> {FieldNameEnum.BODY.getDescription()}, new StandardAnalyzer()); >>>>> Query query = queryParser.parse("amin"); >>>>> TopDocs topDocs = multiSearcher.search(query, >>>>> BooleanQuery.getMaxClauseCount()); >>>>> assertNotNull(topDocs); >>>>> assertEquals(1, topDocs.totalHits); >>>>> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >>>>> for (ScoreDoc scoreDoc : scoreDocs) { >>>>> Document documentFromSearch = indexSearcher.doc(scoreDoc.doc); >>>>> assertNotNull(documentFromSearch); >>>>> String bodyText = >>>>> documentFromSearch.get(FieldNameEnum.BODY.getDescription()); >>>>> assertNotNull(bodyText); >>>>> assertNotSame("", bodyText); >>>>> assertTrue(bodyText.contains("Amin Mohammed-Coleman")); >>>>> assertTrue(bodyText.contains("This is a test rtf document that >>>>> will be indexed.")); >>>>> >>>>> } >>>>> multiSearcher.close(); >>>>> >>>>> } >>>>> >>>>> @After >>>>> public void tearDown() throws Exception { >>>>> rtfFile.delete(); >>>>> if (getDirectory().list() != null && >>>>> getDirectory().list().length > 0) { >>>>> IndexReader reader = IndexReader.open(getDirectory()); >>>>> for(int i = 0; i < reader.maxDoc();i++) { >>>>> reader.deleteDocument(i); >>>>> } >>>>> reader.close(); >>>>> } >>>>> } >>>>> >>>>> private void commitAndCloseWriter(IndexWriter indexWriter) >>>>> throws CorruptIndexException,IOException { >>>>> indexWriter.commit(); >>>>> indexWriter.close(); >>>>> } >>>>> >>>>> >>>>> public Directory getDirectory() throws IOException { >>>>> return FSDirectory.getDirectory("/tmp/lucene/rtf"); >>>>> } >>>>> >>>>> public Analyzer getAnalyzer() { >>>>> return new StandardAnalyzer(); >>>>> } >>>>> private static void convertInputStreamToFile(InputStream >>>>> inputStream, File file) { >>>>> try >>>>> { >>>>> OutputStream out=new FileOutputStream(file); >>>>> byte buf[]=new byte[1024]; >>>>> int len; >>>>> while((len=inputStream.read(buf))>0) >>>>> out.write(buf,0,len); >>>>> out.close(); >>>>> inputStream.close(); >>>>> >>>>> }catch (IOException e){ >>>>> throw new IllegalStateException(e); >>>>> } >>>>> } >>>>> private static class JavaBuiltInRTFHandler implements >>>>> DocumentHandler{ >>>>> >>>>> public Document getDocument(File file) throws >>>>> DocumentHandlerException { >>>>> String bodyText = null; >>>>> DefaultStyledDocument styledDoc = new DefaultStyledDocument(); >>>>> try { >>>>> InputStream inputStream = new FileInputStream(file); >>>>> new RTFEditorKit().read(inputStream, styledDoc, 0); >>>>> bodyText = styledDoc.getText(0, styledDoc.getLength()); >>>>> } catch (IOException ioex) { >>>>> throw new IllegalStateException(ioex); >>>>> } catch (BadLocationException e) { >>>>> throw new IllegalArgumentException(e); >>>>> } >>>>> //create Document object using body >>>>> if (bodyText != null) { >>>>> Document document = new Document(); >>>>> String trimmedBodyText = StringUtils.trimToEmpty(bodyText); >>>>> trimmedBodyText = trimmedBodyText.replaceAll("\n", ""); >>>>> Field field = new >>>>> Field(FieldNameEnum.BODY.getDescription(),trimmedBodyText, >>>>> Field.Store.YES, Field.Index.ANALYZED); >>>>> document.add(field); >>>>> >>>>> String pathToFile = file.getPath(); >>>>> Field pathToFileField = new >>>>> Field(FieldNameEnum.PATH.getDescription(),pathToFile, >>>>> Field.Store.YES, Field.Index.NOT_ANALYZED); >>>>> document.add(pathToFileField); >>>>> >>>>> String fileName = file.getName(); >>>>> Field fileNameField = new >>>>> Field(FieldNameEnum.NAME.getDescription(),fileName, >>>>> Field.Store.YES, Field.Index.NOT_ANALYZED); >>>>> document.add(fileNameField); >>>>> >>>>> Field typeField = new >>>>> Field( >>>>> FieldNameEnum. >>>>> TYPE.getDescription(),IndexerType.RTF_INDEXER.name(), >>>>> Field.Store.YES, Field.Index.NOT_ANALYZED); >>>>> document.add(typeField); >>>>> >>>>> String summary = bodyText.substring(0, 10); >>>>> >>>>> Field summaryField = new >>>>> Field(FieldNameEnum.SUMMARY.getDescription(),summary, >>>>> Field.Store.YES, Field.Index.NOT_ANALYZED); >>>>> document.add(summaryField); >>>>> >>>>> return document; >>>>> } >>>>> return null; >>>>> } >>>>> } >>>>> >>>>> private static class WorkItem { >>>>> >>>>> public enum WorkItemEvent { >>>>> ADD, >>>>> UPDATE, >>>>> DELETE; >>>>> } >>>>> >>>>> public enum IndexerType { >>>>> RTF_INDEXER, >>>>> PDF_INDEXER, >>>>> XML_INDEXER, >>>>> PLAIN_TEXT_INDEXER, >>>>> MS_WORD_INDEXER, >>>>> MS_EXCEL_INDEXER, >>>>> MS_POWERPOINT_INDEXER; >>>>> } >>>>> >>>>> >>>>> private final Document workLoad; >>>>> >>>>> private final WorkItemEvent workItemEvent; >>>>> >>>>> private final IndexerType indexerType; >>>>> >>>>> >>>>> public WorkItem(final Document workLoad, final WorkItemEvent >>>>> workItemEvent) { >>>>> this.workLoad = workLoad; >>>>> this.workItemEvent = workItemEvent; >>>>> String type = this.workLoad.get("type"); >>>>> this.indexerType = IndexerType.valueOf(type); >>>>> } >>>>> >>>>> public IndexerType getIndexerType() { >>>>> return indexerType; >>>>> } >>>>> >>>>> public Document getWorkLoad() { >>>>> return workLoad; >>>>> } >>>>> >>>>> public WorkItemEvent getWorkItemEvent() { >>>>> return workItemEvent; >>>>> } >>>>> } >>>>> >>>>> private enum FieldNameEnum { >>>>> >>>>> AUTHOR("author"), >>>>> BODY("body"), >>>>> TITLE("title"), >>>>> SUBJECT("subject"), >>>>> KEYWORDS("keywords"), >>>>> PATH("path"), NAME ("name"), >>>>> TYPE("type"), >>>>> ID ("id"), >>>>> SUMMARY ("summary"); >>>>> >>>>> private final String description; >>>>> >>>>> private FieldNameEnum(final String description) { >>>>> this.description = description; >>>>> } >>>>> >>>>> public String getDescription() { >>>>> return this.description; >>>>> } >>>>> } >>>>> } >>>> >>>> -------------------------- >>>> Grant Ingersoll >>>> >>>> Lucene Helpful Hints: >>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> -------------------------- >>> Grant Ingersoll >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org