lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: [SOLR] RFC - Contributing a FrequentSearchTerm component ...
Date Fri, 09 Nov 2012 17:44:21 GMT
Absolutely feel free to open up a JIRA and attach a patch for something
like this! You can create an account and edit JIRAs freely.

You don't need to clean it up much before putting up the first patch. It's
often useful to let other eyes take a quick look at it and make comments
before polishing. It's perfectly reasonable to have //TODOs or //nocommit
comments in the code as a flag that "this isn't finished yet", but it's up
to you.

Best
Erick


On Fri, Nov 9, 2012 at 8:37 AM, Siegfried Goeschl <sgoeschl@gmx.at> wrote:

> Hi folks,
>
> I'm now finishing a SOLR project for one of my customers (replacing
> Microsoft FAST server with SOLR) and got the permission to contribute our
> improvements.
>
> The most interesting thing is a "FrequentSearchTerm" component which
> allows to analyze the user-supplied search queries in real-time
>
> +) it keeps track of the last queries per core using a LIFO buffer (so we
> have an upper limit of memory consumption)
>
> +) per query entry we keep track of the number of invocations, the average
> number of result document and the average execution time
>
> +) we allow for custom searches across the frequent search terms using the
> MVEL expression language (see http://mvel.codehaus.org)
> ++) find all queries which did not yield any results - 'meanHits==0'
> ++) find all "iPhone" queries - "searchTerm.contains("iphone) ||
> searchTerm.contains("i-phone)'**'
> ++) find all long-running "iPhone" queries -
> '(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) &&
> meanTime>50'
>
> +) GUI : we have a JSP page which allows to access the frequent search
> terms
>
> +) there is also an XML/CSV export we use to display the 50 most
> frequently used search queries in real-time
>
> We use this component
>
> +) to get input for QA regarding frequently used search terms
> +) to find strange queries, e.g. queries returning no or too many result,
> e.g. caused by WordDelimeterFilter
> +) to keep our management happy ... :-)
>
> So the question is - is the community interested in such a contribution?
> If yes than I need to spend some time to improve the code from "industrial
> quality" to "open source quality" including documentation ... you know what
> I mean .... :-)
>
> Thanks in advance,
>
> Siegfried Goeschl
>
> PS: Not sure if the name "Frequent Search Term Component" is perfectly
> suitable as it was taken from FAST - suggestions welcome
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.**org<dev-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message