lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siegfried Goeschl (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-4056) Contribution of component to gather the most frequent user search request in real-time
Date Fri, 09 Nov 2012 18:06:12 GMT
Siegfried Goeschl created SOLR-4056:
---------------------------------------

             Summary: Contribution of component to gather the most frequent user search request
in real-time
                 Key: SOLR-4056
                 URL: https://issues.apache.org/jira/browse/SOLR-4056
             Project: Solr
          Issue Type: New Feature
          Components: SearchComponents - other
    Affects Versions: 3.6.1
            Reporter: Siegfried Goeschl
            Priority: Minor
             Fix For: 3.6.2


I'm now finishing a SOLR project for one of my customers (replacing Microsoft FAST server
with SOLR) and got the permission to contribute our improvements.

The most interesting thing is a "FrequentSearchTerm" component which allows to analyze the
user-supplied search queries in real-time

 * it keeps track of the last queries per core using a LIFO buffer (so we have an upper limit
of memory consumption)
 * per query entry we keep track of the number of invocations, the average number of result
document and the average execution time
 * we allow for custom searches across the frequent search terms using the MVEL expression
language (see http://mvel.codehaus.org)
 ** find all queries which did not yield any results - 'meanHits==0'
 ** find all "iPhone" queries - "searchTerm.contains("iphone) || searchTerm.contains("i-phone)''
 ** find all long-running "iPhone" queries - '(searchTerm.contains("iphone) || searchTerm.contains("i-phone))
&& meanTime>50'
 * GUI : we have a JSP page which allows to access the frequent search terms
 * there is also an XML/CSV export we use to display the 50 most frequently used search queries
in real-time

We use this component

 * to get input for QA regarding frequently used search terms
 * to find strange queries, e.g. queries returning no or too many result, e.g. caused by WordDelimeterFilter
 * to keep our management happy ... :-)


 Not sure if the name "Frequent Search Term Component" is perfectly suitable as it was taken
from FAST - suggestions welcome. Maybe "FrequentSearchQueryComponent" would be more suitable?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message