oh man! Sorry about that...lack of sleep due to new baby in the house...
On 1 Jan 2009, at 20:41, Nils-Helge Garli Hegvik wrote:
> Maybe you should try posting to a Lucene mailing list?
>
> Nils-H
>
> On Thu, Jan 1, 2009 at 9:28 PM, Amin Mohammed-Coleman <aminmc@gmail.com
> > wrote:
>> Hi
>>
>> I have created a RTFHandler which takes a RTF file and creates a
>> lucene
>> Document which is indexed. The RTFHandler looks like something
>> like this:
>>
>> if (bodyText != null) {
>> Document document = new Document();
>> Field field = new
>> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),
>> Field.Store.YES,
>> Field.Index.ANALYZED);
>> document.add(field);
>>
>>
>> }
>>
>> I am using Java Built in RTF text extraction. When I run my test
>> to verify
>> that the document contains text that I expect this works fine. I
>> get the
>> following when I print the document:
>>
>> Document<stored/uncompressed,indexed,tokenized<body:This is a test
>> rtf
>> document that will be indexed.
>>
>> Amin Mohammed-Coleman>
>> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
>> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
>> stored/uncompressed,indexed<type:RTF_INDEXER>
>> stored/uncompressed,indexed<summary:This is a >>
>>
>>
>> The problem is when I use the following to search I get no result:
>>
>> MultiSearcher multiSearcher = new MultiSearcher(new
>> Searchable[]
>> {rtfIndexSearcher});
>> Term t = new Term("body", "Amin");
>> TermQuery termQuery = new TermQuery(t);
>> TopDocs topDocs =
>> multiSearcher.search(termQuery, 1);
>> System.out.println(topDocs.totalHits);
>> multiSearcher.close();
>>
>> RftIndexSearcher is configured with the directory that holds rtf
>> documents.
>> I have used Luke to look at the document and what I am finding in the
>> overview tab is the following for the document:
>>
>> 1 body test
>> 1 id 1234
>> 1 name rtfDocumentToIndex.rtf
>> 1 path rtfDocumentToIndex.rtf
>> 1 summary This is a
>> 1 type RTF_INDEXER
>> 1 body rtf
>>
>>
>> However on the Document tab I am getting (in the body field):
>>
>> This is a test rtf document that will be indexed.
>>
>> Amin Mohammed-Coleman
>>
>>
>> I would expect to get a hit using "Amin" or even "document". I am
>> not sure
>> whether the
>> line:
>> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>>
>> is incorrect as I am not too sure of the meaning of "Finds the top
>> n hits
>> for query." for search (Query query, int n) according to java docs.
>>
>> I would be grateful if someone may be able to advise on what I may
>> be doing
>> wrong. I am using Lucene 2.4.0
>>
>>
>> Cheers
>> Amin
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
> For additional commands, e-mail: user-help@struts.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org
|