lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Search Problem
Date Fri, 02 Jan 2009 07:30:00 GMT
Hi

Sorry I was using the StandardAnalyzer in this instance.

Cheers



On 2 Jan 2009, at 00:55, Chris Lu wrote:

> You need to let us know the analyzer you are using.
> --  
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per  
> request) got
> 2.6 Million Euro funding!
>
> On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
> >wrote:
>
>>
>>
>>> Hi
>>>
>>> I have created a RTFHandler which takes a RTF file and creates a  
>>> lucene
>>> Document which is indexed.  The RTFHandler looks like something  
>>> like this:
>>>
>>> if (bodyText != null) {
>>>                       Document document = new Document();
>>>                       Field field = new
>>> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),  
>>> Field.Store.YES,
>>> Field.Index.ANALYZED);
>>>                       document.add(field);
>>>
>>>
>>> }
>>>
>>> I am using Java Built in RTF text extraction.  When I run my test to
>>> verify that the document contains text that I expect this works  
>>> fine.  I get
>>> the following when I print the document:
>>>
>>> Document<stored/uncompressed,indexed,tokenized<body:This is a test  
>>> rtf
>>> document that will be indexed.
>>>
>>> Amin Mohammed-Coleman>
>>> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
>>> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
>>> stored/uncompressed,indexed<type:RTF_INDEXER>
>>> stored/uncompressed,indexed<summary:This is a >>
>>>
>>>
>>> The problem is when I use the following to search I get no result:
>>>
>>>       MultiSearcher multiSearcher = new MultiSearcher(new  
>>> Searchable[]
>>> {rtfIndexSearcher});
>>>                       Term t = new Term("body", "Amin");
>>>                       TermQuery termQuery = new TermQuery(t);
>>>                       TopDocs topDocs =  
>>> multiSearcher.search(termQuery,
>>> 1);
>>>                       System.out.println(topDocs.totalHits);
>>>                       multiSearcher.close();
>>>
>>> RftIndexSearcher is configured with the directory that holds rtf
>>> documents.  I have used Luke to look at the document and what I am  
>>> finding
>>> in the overview tab is the following for the document:
>>>
>>> 1       body    test
>>> 1       id      1234
>>> 1       name    rtfDocumentToIndex.rtf
>>> 1       path    rtfDocumentToIndex.rtf
>>> 1       summary This is a
>>> 1       type    RTF_INDEXER
>>> 1       body    rtf
>>>
>>>
>>> However on the Document tab I am getting (in the body field):
>>>
>>> This is a test rtf document that will be indexed.
>>>
>>> Amin Mohammed-Coleman
>>>
>>>
>>> I would expect to get a hit using "Amin" or even "document".  I am  
>>> not
>>> sure whether the
>>> line:
>>> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>>>
>>> is incorrect as I am not too sure of the meaning of "Finds the top  
>>> n hits
>>> for query." for search (Query query, int n) according to java docs.
>>>
>>> I would be grateful if someone may be able to advise on what I may  
>>> be
>>> doing wrong.  I am using Lucene 2.4.0
>>>
>>>
>>> Cheers
>>> Amin
>>>
>>>
>>>
>>>
>>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message