lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siegfried Goeschl <sgoes...@gmx.at>
Subject [SOLR] RFC - Contributing a FrequentSearchTerm component ...
Date Fri, 09 Nov 2012 13:37:18 GMT
Hi folks,

I'm now finishing a SOLR project for one of my customers (replacing 
Microsoft FAST server with SOLR) and got the permission to contribute 
our improvements.

The most interesting thing is a "FrequentSearchTerm" component which 
allows to analyze the user-supplied search queries in real-time

+) it keeps track of the last queries per core using a LIFO buffer (so 
we have an upper limit of memory consumption)

+) per query entry we keep track of the number of invocations, the 
average number of result document and the average execution time

+) we allow for custom searches across the frequent search terms using 
the MVEL expression language (see http://mvel.codehaus.org)
++) find all queries which did not yield any results - 'meanHits==0'
++) find all "iPhone" queries - "searchTerm.contains("iphone) || 
searchTerm.contains("i-phone)''
++) find all long-running "iPhone" queries - 
'(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) && 
meanTime>50'

+) GUI : we have a JSP page which allows to access the frequent search terms

+) there is also an XML/CSV export we use to display the 50 most 
frequently used search queries in real-time

We use this component

+) to get input for QA regarding frequently used search terms
+) to find strange queries, e.g. queries returning no or too many 
result, e.g. caused by WordDelimeterFilter
+) to keep our management happy ... :-)

So the question is - is the community interested in such a contribution? 
If yes than I need to spend some time to improve the code from 
"industrial quality" to "open source quality" including documentation 
... you know what I mean .... :-)

Thanks in advance,

Siegfried Goeschl

PS: Not sure if the name "Frequent Search Term Component" is perfectly 
suitable as it was taken from FAST - suggestions welcome

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message