struts-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nils-Helge Garli Hegvik" <nil...@gmail.com>
Subject Re: Search Problem
Date Thu, 01 Jan 2009 20:41:03 GMT
Maybe you should try posting to a Lucene mailing list?

Nils-H

On Thu, Jan 1, 2009 at 9:28 PM, Amin Mohammed-Coleman <aminmc@gmail.com> wrote:
> Hi
>
> I have created a RTFHandler which takes a RTF file and creates a lucene
> Document which is indexed.  The RTFHandler looks like something like this:
>
> if (bodyText != null) {
>                        Document document = new Document();
>                        Field field = new
> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES,
> Field.Index.ANALYZED);
>                        document.add(field);
>
>
> }
>
> I am using Java Built in RTF text extraction.  When I run my test to verify
> that the document contains text that I expect this works fine.  I get the
> following when I print the document:
>
> Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf
> document that will be indexed.
>
> Amin Mohammed-Coleman>
> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
> stored/uncompressed,indexed<type:RTF_INDEXER>
> stored/uncompressed,indexed<summary:This is a >>
>
>
> The problem is when I use the following to search I get no result:
>
>        MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]
> {rtfIndexSearcher});
>                        Term t = new Term("body", "Amin");
>                        TermQuery termQuery = new TermQuery(t);
>                        TopDocs topDocs = multiSearcher.search(termQuery, 1);
>                        System.out.println(topDocs.totalHits);
>                        multiSearcher.close();
>
> RftIndexSearcher is configured with the directory that holds rtf documents.
>  I have used Luke to look at the document and what I am finding in the
> overview tab is the following for the document:
>
> 1       body    test
> 1       id      1234
> 1       name    rtfDocumentToIndex.rtf
> 1       path    rtfDocumentToIndex.rtf
> 1       summary This is a
> 1       type    RTF_INDEXER
> 1       body    rtf
>
>
> However on the Document tab I am getting (in the body field):
>
> This is a test rtf document that will be indexed.
>
> Amin Mohammed-Coleman
>
>
> I would expect to get a hit using "Amin" or even "document".  I am not sure
> whether the
> line:
> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>
> is incorrect as I am not too sure of the meaning of "Finds the top n hits
> for query." for search (Query query, int n) according to java docs.
>
> I would be grateful if someone may be able to advise on what I may be doing
> wrong.  I am using Lucene 2.4.0
>
>
> Cheers
> Amin
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@struts.apache.org
For additional commands, e-mail: user-help@struts.apache.org


Mime
View raw message