lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Lucene query with long strings
Date Tue, 23 Mar 2010 21:41:34 GMT
> hi all, I have been playing
> with Lucene for a while now, but stuck on a perplexing
> issue.
> 
> I have an index, with a field "Affiliation", some example
> values are:
> 
> - "Stanford University School of Medicine, Palo Alto, CA
> USA", 
> - "Institute of Neurobiology, School of Medicine, Stanford
> University, Palo Alto, CA", 
> - "School of Medicine, Harvard University, Boston MA",
> - "Brigham & Women's, Harvard University School of
> Medicine, Boston, MA" 
> - "Harvard University, Cambridge MA"
> 
> and so on... (the bottom-line being the affiliations are
> written in multiple ways with no apparent consistency)
> 
> I query the index on  the affiliation field using say
> "School of Medicine, Stanford University, Palo Alto, CA"
> (with QueryParser) to find all Stanford related documents,
> I get a lot of false +ves, presumably because of the
> presence of School of Medicine etc. etc. (note: I cannot use
> Phrase query because of variability in the way affiliation
> is constructed)
> 
> I have tried the following:
> 
> 1. Use a SpanNearQuery by splitting the search phrase with
> a whitespace (here I get no results!)
> 2. Tried boosting (using ^) by splitting with the comma and
> boosting the last parts such as "Palo Alto CA" with a much
> higher boost than the initial phrases. Here I still get lots
> of false +ves.
> 
> Any suggestions on how to approach this? Is SpanNear the
> way to go? Any other ideas on why I get 0 results? 
> 
> Thanks in advance for helping a newbie.

If I were you, I would start with default operator as pure AND. (100% clauses must match)
QueryParser.setDefaultOperator();

If this query does not return any documents I would switch to OR as an default operator and
get documents matching 80% of the optional clauses. If not I would lower the percentage of
the optional clauses that should match. Lets say till 50%. This param can be set using : 

Query q = QueryParser.parse("School of Medicine, Stanford University, Palo Alto, CA");
if(q instanceof BooleanQuery)
q(BooleanQuery).setMinimumNumberShouldMatch()




      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message