Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 97997 invoked from network); 11 Oct 2010 18:55:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Oct 2010 18:55:57 -0000 Received: (qmail 21137 invoked by uid 500); 11 Oct 2010 18:55:56 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 21091 invoked by uid 500); 11 Oct 2010 18:55:56 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 21083 invoked by uid 99); 11 Oct 2010 18:55:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 18:55:56 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 18:55:54 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9BItWZr005609 for ; Mon, 11 Oct 2010 18:55:32 GMT Message-ID: <3396427.81301286823332415.JavaMail.jira@thor> Date: Mon, 11 Oct 2010 14:55:32 -0400 (EDT) From: =?utf-8?Q?Jan_H=C3=B8ydahl_=28JIRA=29?= To: dev@lucene.apache.org Subject: [jira] Commented: (SOLR-2150) Anti-phrasing feature In-Reply-To: <26497960.75251286800533238.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2150?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12919= 942#action_12919942 ]=20 Jan H=C3=B8ydahl commented on SOLR-2150: ----------------------------------- What you describe is also a useful feature. I think of it even more generic= , as a place to configure detection of various patterns, and apply some act= ion on the query based on he match, whether that is fetching a weather fore= cast from an API, performing a calculation or rewriting the query to apply = a filter. I think it deserves its own feature request, and then one could d= ecide whether the same code base could power parts of both later in the des= ign phase. > Anti-phrasing feature > --------------------- > > Key: SOLR-2150 > URL: https://issues.apache.org/jira/browse/SOLR-2150 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other > Reporter: Jan H=C3=B8ydahl > > Add an anti-phrasing feature to Solr. > Definition: Identifying word sequences in queries that do not contribute = essentially to the query's meaning, such as "Where can I find" or "Where is= ." > (Source: http://www.google.com/search?q=3Ddefine%3Aanti+phrasing) > For general purpose search services, such as web, intranet, shopping sear= ch, some users will try to write a question to the search engine, such as "= how much is an ipod nano". One straight-forward way of limiting the number = of 0-hits in such environments is to apply anti-phrasing, which uses a dict= ionary of common sentence prefixes which should be stripped from the incomi= ng query before it is sent further to search. > This can be implemented as a Search Component in Solr. The dictionary can= be language independent. We can encourage users to submit their tested ant= i-phrasing dictionaries for various languages, and include those. The dicti= onary can be a set of simple .txt files, loaded in memory at startup in an = efficient data structure such as b-tree or finite state automaton to avoid = redundancy and ensure quick matching. The procedure for detecting an anti-p= hrase from the incoming query is to first lookup the full query phrase, if = no match, remove a word from the end, and do another lookup until either a = match or end of string. Example for query: "Who is Einstein?", where "Who i= s" is defined as an anti phrase. > 1. Lookup "Who is Einstein" > 2. Lookup "Who is" (match), remove this prefix > 3. Issue the query "Einstein" to search --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org