lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Search Problem
Date Fri, 02 Jan 2009 11:39:51 GMT
Hi

I have tried this and it doesn't work.  I don't understand why using  
"amin" instead of "Amin" would work, is it not case insensitive?

I tried "test" for field "body" and this works.  Any other terms don't  
work for example:

"document"
"indexed"

these are tokens that were extracted when creating the lucene document.


Thanks for your reply.

Cheers

Amin

On 2 Jan 2009, at 10:36, Chris Lu wrote:

> Basically Lucene stores analyzed tokens, and looks up for the  
> matches based
> on the tokens.
> "Amin" after StandardAnalyzer is "amin", so you need to use new  
> Term("body",
> "amin"), instead of new Term("body", "Amin"), to search.
>
> -- 
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per  
> request) got
> 2.6 Million Euro funding!
>
> On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
> >wrote:
>
>> Hi
>>
>> Sorry I was using the StandardAnalyzer in this instance.
>>
>> Cheers
>>
>>
>>
>>
>> On 2 Jan 2009, at 00:55, Chris Lu wrote:
>>
>> You need to let us know the analyzer you are using.
>>> -- Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>>
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per  
>>> request) got
>>> 2.6 Million Euro funding!
>>>
>>> On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman <aminmc@gmail.com
>>>> wrote:
>>>
>>>
>>>>
>>>> Hi
>>>>>
>>>>> I have created a RTFHandler which takes a RTF file and creates a  
>>>>> lucene
>>>>> Document which is indexed.  The RTFHandler looks like something  
>>>>> like
>>>>> this:
>>>>>
>>>>> if (bodyText != null) {
>>>>>                     Document document = new Document();
>>>>>                     Field field = new
>>>>> Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),
>>>>> Field.Store.YES,
>>>>> Field.Index.ANALYZED);
>>>>>                     document.add(field);
>>>>>
>>>>>
>>>>> }
>>>>>
>>>>> I am using Java Built in RTF text extraction.  When I run my  
>>>>> test to
>>>>> verify that the document contains text that I expect this works  
>>>>> fine.  I
>>>>> get
>>>>> the following when I print the document:
>>>>>
>>>>> Document<stored/uncompressed,indexed,tokenized<body:This is a 

>>>>> test rtf
>>>>> document that will be indexed.
>>>>>
>>>>> Amin Mohammed-Coleman>
>>>>> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf>
>>>>> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf>
>>>>> stored/uncompressed,indexed<type:RTF_INDEXER>
>>>>> stored/uncompressed,indexed<summary:This is a >>
>>>>>
>>>>>
>>>>> The problem is when I use the following to search I get no result:
>>>>>
>>>>>     MultiSearcher multiSearcher = new MultiSearcher(new  
>>>>> Searchable[]
>>>>> {rtfIndexSearcher});
>>>>>                     Term t = new Term("body", "Amin");
>>>>>                     TermQuery termQuery = new TermQuery(t);
>>>>>                     TopDocs topDocs =  
>>>>> multiSearcher.search(termQuery,
>>>>> 1);
>>>>>                     System.out.println(topDocs.totalHits);
>>>>>                     multiSearcher.close();
>>>>>
>>>>> RftIndexSearcher is configured with the directory that holds rtf
>>>>> documents.  I have used Luke to look at the document and what I am
>>>>> finding
>>>>> in the overview tab is the following for the document:
>>>>>
>>>>> 1       body    test
>>>>> 1       id      1234
>>>>> 1       name    rtfDocumentToIndex.rtf
>>>>> 1       path    rtfDocumentToIndex.rtf
>>>>> 1       summary This is a
>>>>> 1       type    RTF_INDEXER
>>>>> 1       body    rtf
>>>>>
>>>>>
>>>>> However on the Document tab I am getting (in the body field):
>>>>>
>>>>> This is a test rtf document that will be indexed.
>>>>>
>>>>> Amin Mohammed-Coleman
>>>>>
>>>>>
>>>>> I would expect to get a hit using "Amin" or even "document".  I  
>>>>> am not
>>>>> sure whether the
>>>>> line:
>>>>> TopDocs topDocs = multiSearcher.search(termQuery, 1);
>>>>>
>>>>> is incorrect as I am not too sure of the meaning of "Finds the  
>>>>> top n
>>>>> hits
>>>>> for query." for search (Query query, int n) according to java  
>>>>> docs.
>>>>>
>>>>> I would be grateful if someone may be able to advise on what I  
>>>>> may be
>>>>> doing wrong.  I am using Lucene 2.4.0
>>>>>
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message