lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Schon <aaron_sc...@yahoo.com>
Subject Lucene query with long strings
Date Tue, 23 Mar 2010 21:08:12 GMT
hi all, I have been playing with Lucene for a while now, but stuck on a perplexing issue.

I have an index, with a field "Affiliation", some example values are:

- "Stanford University School of Medicine, Palo Alto, CA USA", 
- "Institute of Neurobiology, School of Medicine, Stanford University, Palo Alto, CA", 
- "School of Medicine, Harvard University, Boston MA",
- "Brigham & Women's, Harvard University School of Medicine, Boston, MA" 
- "Harvard University, Cambridge MA"

and so on... (the bottom-line being the affiliations are written in multiple ways with no
apparent consistency)

I query the index on  the affiliation field using say "School of Medicine, Stanford University,
Palo Alto, CA" (with QueryParser) to find all Stanford related documents, I get a lot of
false +ves, presumably because of the presence of School of Medicine etc. etc. (note: I cannot
use Phrase query because of variability in the way affiliation is constructed)

I have tried the following:

1. Use a SpanNearQuery by splitting the search phrase with a whitespace (here I get no results!)
2. Tried boosting (using ^) by splitting with the comma and boosting the last parts such as
"Palo Alto CA" with a much higher boost than the initial phrases. Here I still get lots of
false +ves.

Any suggestions on how to approach this? Is SpanNear the way to go? Any other ideas on why
I get 0 results? 

Thanks in advance for helping a newbie.

AS


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message