lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Fwd: Search Problem
Date Thu, 01 Jan 2009 21:11:00 GMT

>
> Hi
>
> I have created a RTFHandler which takes a RTF file and creates a  
> lucene Document which is indexed.  The RTFHandler looks like  
> something like this:
>
> if (bodyText != null) {
> 			Document document = new Document();
> 			Field field = new Field(MetaDataEnum.BODY.getDescription(),  
> bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);
> 			document.add(field);
> 			
> 		
> }
>
> I am using Java Built in RTF text extraction.  When I run my test to  
> verify that the document contains text that I expect this works  
> fine.  I get the following when I print the document:
>
> Document<stored/uncompressed,indexed,tokenized<body:This is a test  
> rtf document that will be indexed.
>
> Amin Mohammed-Coleman> stored/ 
> uncompressed,indexed<path:rtfDocumentToIndex.rtf> stored/ 
> uncompressed,indexed<name:rtfDocumentToIndex.rtf> stored/ 
> uncompressed,indexed<type:RTF_INDEXER> stored/ 
> uncompressed,indexed<summary:This is a >>
>
>
> The problem is when I use the following to search I get no result:
>
> 	MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]  
> {rtfIndexSearcher});
> 			Term t = new Term("body", "Amin");
> 			TermQuery termQuery = new TermQuery(t);
> 			TopDocs topDocs = multiSearcher.search(termQuery, 1);
> 			System.out.println(topDocs.totalHits);
> 			multiSearcher.close();
>
> RftIndexSearcher is configured with the directory that holds rtf  
> documents.  I have used Luke to look at the document and what I am  
> finding in the overview tab is the following for the document:
>
> 1	body	test
> 1	id	1234
> 1	name	rtfDocumentToIndex.rtf
> 1	path	rtfDocumentToIndex.rtf
> 1	summary	This is a
> 1	type	RTF_INDEXER
> 1	body	rtf
>
>
> However on the Document tab I am getting (in the body field):
>
> This is a test rtf document that will be indexed.
>
> Amin Mohammed-Coleman
>
>
> I would expect to get a hit using "Amin" or even "document".  I am  
> not sure whether the
> line:
> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>
> is incorrect as I am not too sure of the meaning of "Finds the top n  
> hits for query." for search (Query query, int n) according to java  
> docs.
>
> I would be grateful if someone may be able to advise on what I may  
> be doing wrong.  I am using Lucene 2.4.0
>
>
> Cheers
> Amin
>
>
>
> 		


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message