lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] Commented: (SOLR-2150) Anti-phrasing feature
Date Mon, 11 Oct 2010 18:55:32 GMT


Jan Høydahl commented on SOLR-2150:

What you describe is also a useful feature. I think of it even more generic, as a place to
configure detection of various patterns, and apply some action on the query based on he match,
whether that is fetching a weather forecast from an API, performing a calculation or rewriting
the query to apply a filter. I think it deserves its own feature request, and then one could
decide whether the same code base could power parts of both later in the design phase.

> Anti-phrasing feature
> ---------------------
>                 Key: SOLR-2150
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>            Reporter: Jan Høydahl
> Add an anti-phrasing feature to Solr.
> Definition: Identifying word sequences in queries that do not contribute essentially
to the query's meaning, such as "Where can I find" or "Where is."
> (Source:
> For general purpose search services, such as web, intranet, shopping search, some users
will try to write a question to the search engine, such as "how much is an ipod nano". One
straight-forward way of limiting the number of 0-hits in such environments is to apply anti-phrasing,
which uses a dictionary of common sentence prefixes which should be stripped from the incoming
query before it is sent further to search.
> This can be implemented as a Search Component in Solr. The dictionary can be language
independent. We can encourage users to submit their tested anti-phrasing dictionaries for
various languages, and include those. The dictionary can be a set of simple .txt files, loaded
in memory at startup in an efficient data structure such as b-tree or finite state automaton
to avoid redundancy and ensure quick matching. The procedure for detecting an anti-phrase
from the incoming query is to first lookup the full query phrase, if no match, remove a word
from the end, and do another lookup until either a match or end of string. Example for query:
"Who is Einstein?", where "Who is" is defined as an anti phrase.
> 1. Lookup "Who is Einstein"
> 2. Lookup "Who is" (match), remove this prefix
> 3. Issue the query "Einstein" to search

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message