lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <raje...@dessci.com>
Subject RE: Case-sensitive search
Date Mon, 22 Aug 2005 20:45:35 GMT
If the original document contains the case-sensitive content that
document will be retrieved with higher score. 

For example:
Document 1: "Java Virtual Machine is ...."
Document 2: "java virtual machine is ...."

Index Contents of Document 1:
Java java
Virtual virtual
Machine machine
Is

Index contents of Document 2:
java
virtual
machine
is

If I understand correct, the query "Java Virtual machine" should
retrieve with Document 1 ranked higher than Document 2 because the query
is matched twice in Document 1 but only once in Document 2. In the case
of all lower case queries we don't have to OR the queries.

Rajesh Munavalli

> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> Sent: Monday, August 22, 2005 3:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: Case-sensitive search
> 
> 
> On Aug 22, 2005, at 11:43 AM, Rajesh Munavalli wrote:
> 
> > At the query time I was thinking of two queries ORed toegether. One 
> > with user entered query and the other case insensitive query. For 
> > example:
> >
> > The user query "Java Virtual machine" would be translated 
> into "Java 
> > Virtual machine" OR "java virtual machine".
> >
> > Eventhough the user mistyped the case ("machine" instead of 
> > "Machine"), the query would retrieve documents. I am not sure about 
> > the performance though. Erik would be the right person to help us 
> > understand performance constraints in doing so.
> 
> Performance issues aside (as they aren't really a factor in 
> this equation), I still don't understand what good ORing 
> queries with various cases does in the scenario you describe. 
>  If you're going to index lower case, and then OR in the 
> lowercase, then there is really no need to even have the 
> case-sensitive parts there in the index or in the query.
> 
>      Erik
> 
> 
> 
> >
> > Rajesh Munavalli
> >
> >
> >> -----Original Message-----
> >> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >> Sent: Monday, August 22, 2005 10:20 AM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Case-sensitive search
> >>
> >>
> >> On Aug 22, 2005, at 11:10 AM, Rajesh Munavalli wrote:
> >>
> >>> You could also treat the case-sensitive and case-insensitive as
> >>> Synonyms and index them at the same position. This would be
> >>>
> >> helpful in
> >>
> >>> phrase queries.
> >>>
> >>
> >> You wouldn't be able to selectively toggle between
> >> case-sensitive and -insensitive searching this way, though,
> >> so I'm not sure if there is merit in doing both cases in the
> >> same position or not.
> >>
> >>      Erik
> >>
> >>
> >>
> >>>
> >>> Rajesh Munavalli
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >>>> Sent: Monday, August 22, 2005 10:04 AM
> >>>> To: java-user@lucene.apache.org
> >>>> Subject: Re: Case-sensitive search
> >>>>
> >>>>
> >>>> On Aug 22, 2005, at 10:40 AM, tareque@controldocs.com wrote:
> >>>>
> >>>>
> >>>>> Is there any way to index as case-sensitive and then, while
> >>>>>
> >>>>>
> >>>> searching,
> >>>>
> >>>>
> >>>>> making the search case-sensitive and case-insensitive using
> >>>>>
> >>>>>
> >>>> the same
> >>>>
> >>>>
> >>>>> index as needed?
> >>>>>
> >>>>>
> >>>>
> >>>> Not really.  Terms in the index are ordered lexicographically,
> >>>> including case.  It certainly would be possible to write
> >>>>
> >> customized
> >>
> >>>> Query subclasses to do this sort of thing at the expense of
> >>>> performance.
> >>>>
> >>>> The only techniques I'm aware of are to either build
> >>>>
> >> separate indexes
> >>
> >>>> or index the same information into separate fields of the same
> >>>> documents using different analyzers per field.
> >>>>
> >>>>      Erik
> >>>>
> >>>>
> >>>>
> >>>>
> >> 
> ---------------------------------------------------------------------
> >>
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >> 
> ---------------------------------------------------------------------
> >>
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message