lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Ball <christopher.b...@metaheuristica.com>
Subject RE: Index an entire Phrase and not it's constituent parts?
Date Sun, 14 Mar 2010 03:45:08 GMT
Thank you for the idea Mitch, but it just doesn't seem right that I should
have to revert to Scoring when what I really need seems so fundamental.

Logically, what I want is a "phrase filter factory" that would match on
phrases listed in a file, like stopwords, but in this case index the match
and then discard the words of the phrase from the stream before passing it
on to the next filter given the phrases are imbedded in paragraphs which
have other valid index material. 

So an analyzer would look something like:

      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PhraseFilterFactory "/>
        <filter class="solr.StopFilterFactory"/>
      </analyzer>

Of course, one riddle that this leaves us how to match a tokenized stream. .
. so maybe I need to also write my own tokenizer. Just seems like this would
have been a previously desired and solved problem.

Or may be I should try solr.KeepWordFilterFactory if it can deal with
phrases . . ?

I'm stumped =(

-----Original Message-----
From: MitchK [mailto:mitch91@web.de] 
Sent: Saturday, March 13, 2010 8:12 AM
To: solr-user@lucene.apache.org
Subject: RE: Index an entire Phrase and not it's constituent parts?


Christopher,

maybe the SynonymFilter can help you to solve your problem.

Let me try to explain:
If you create an extra field in the index for your use-case, you can boost
matches of them in a special way. 

The next step is creating an extra synonym-file.
as much as => SpecialPhrase1
in amount of => SpecialPhrase2
... and so on...

If an user wants to query for something like "as much as I love you" you can
do some boosting on matches from the SpecialPhrase-field and you are able to
response results from both: the normal StopWordFiltered data and the
SpecialPhrase-data.

If this fits your needs, please let me know.

Kind regards
- Mitch
-- 
View this message in context:
http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-part
s--tp27785521p27887564.html
Sent from the Solr - User mailing list archive at Nabble.com.




Mime
View raw message