lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Search Problem
Date Fri, 02 Jan 2009 00:55:41 GMT
You need to let us know the analyzer you are using.
-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman <aminmc@gmail.com>wrote:

>
>
>> Hi
>>
>> I have created a RTFHandler which takes a RTF file and creates a lucene
>> Document which is indexed.  The RTFHandler looks like something like this:
>>
>> if (bodyText != null) {
>>                        Document document = new Document();
>>                        Field field = new
>> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES,
>> Field.Index.ANALYZED);
>>                        document.add(field);
>>
>>
>> }
>>
>> I am using Java Built in RTF text extraction.  When I run my test to
>> verify that the document contains text that I expect this works fine.  I get
>> the following when I print the document:
>>
>> Document<stored/uncompressed,indexed,tokenized<body:This is a test rtf
>> document that will be indexed.
>>
>> Amin Mohammed-Coleman>
>> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
>> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
>> stored/uncompressed,indexed<type:RTF_INDEXER>
>> stored/uncompressed,indexed<summary:This is a >>
>>
>>
>> The problem is when I use the following to search I get no result:
>>
>>        MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]
>> {rtfIndexSearcher});
>>                        Term t = new Term("body", "Amin");
>>                        TermQuery termQuery = new TermQuery(t);
>>                        TopDocs topDocs = multiSearcher.search(termQuery,
>> 1);
>>                        System.out.println(topDocs.totalHits);
>>                        multiSearcher.close();
>>
>> RftIndexSearcher is configured with the directory that holds rtf
>> documents.  I have used Luke to look at the document and what I am finding
>> in the overview tab is the following for the document:
>>
>> 1       body    test
>> 1       id      1234
>> 1       name    rtfDocumentToIndex.rtf
>> 1       path    rtfDocumentToIndex.rtf
>> 1       summary This is a
>> 1       type    RTF_INDEXER
>> 1       body    rtf
>>
>>
>> However on the Document tab I am getting (in the body field):
>>
>> This is a test rtf document that will be indexed.
>>
>> Amin Mohammed-Coleman
>>
>>
>> I would expect to get a hit using "Amin" or even "document".  I am not
>> sure whether the
>> line:
>> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>>
>> is incorrect as I am not too sure of the meaning of "Finds the top n hits
>> for query." for search (Query query, int n) according to java docs.
>>
>> I would be grateful if someone may be able to advise on what I may be
>> doing wrong.  I am using Lucene 2.4.0
>>
>>
>> Cheers
>> Amin
>>
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message