lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] Commented: (SOLR-2150) Anti-phrasing feature
Date Mon, 11 Oct 2010 18:55:32 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919942#action_12919942
] 

Jan Høydahl commented on SOLR-2150:
-----------------------------------

What you describe is also a useful feature. I think of it even more generic, as a place to
configure detection of various patterns, and apply some action on the query based on he match,
whether that is fetching a weather forecast from an API, performing a calculation or rewriting
the query to apply a filter. I think it deserves its own feature request, and then one could
decide whether the same code base could power parts of both later in the design phase.

> Anti-phrasing feature
> ---------------------
>
>                 Key: SOLR-2150
>                 URL: https://issues.apache.org/jira/browse/SOLR-2150
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>            Reporter: Jan Høydahl
>
> Add an anti-phrasing feature to Solr.
> Definition: Identifying word sequences in queries that do not contribute essentially
to the query's meaning, such as "Where can I find" or "Where is."
> (Source: http://www.google.com/search?q=define%3Aanti+phrasing)
> For general purpose search services, such as web, intranet, shopping search, some users
will try to write a question to the search engine, such as "how much is an ipod nano". One
straight-forward way of limiting the number of 0-hits in such environments is to apply anti-phrasing,
which uses a dictionary of common sentence prefixes which should be stripped from the incoming
query before it is sent further to search.
> This can be implemented as a Search Component in Solr. The dictionary can be language
independent. We can encourage users to submit their tested anti-phrasing dictionaries for
various languages, and include those. The dictionary can be a set of simple .txt files, loaded
in memory at startup in an efficient data structure such as b-tree or finite state automaton
to avoid redundancy and ensure quick matching. The procedure for detecting an anti-phrase
from the incoming query is to first lookup the full query phrase, if no match, remove a word
from the end, and do another lookup until either a match or end of string. Example for query:
"Who is Einstein?", where "Who is" is defined as an anti phrase.
> 1. Lookup "Who is Einstein"
> 2. Lookup "Who is" (match), remove this prefix
> 3. Issue the query "Einstein" to search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message