lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Goetzke" <uwe.goet...@healy-hudson.com>
Subject AW: Relevancy Practices
Date Mon, 03 May 2010 13:18:15 GMT
Regarding Part3: 
Data quality
For our search domain (catalog products) we face very often the problem that the search data
is full of acronyms and abbreviations like:
cable,nym-j,pvc,3x2.5mm²
or
dvd-/cd-/usb-carradio,4x50W,divx,bl

We solved this by a combination of normalization for better data quality (or less variations)
and some tolerant sloppy phrase search where the search token needs only to partly match an
indexed token.
We use here a dictionary lookup approach into the indexed tokens of some fields and expand
the users query with a well weighted set of search terms.

It took us some iterations to get this right and fast enough to search in several million
products.
The next step on our list are facets.

Uwe 


-----Urspr√ľngliche Nachricht-----
Von: mbennett.ideaeng@gmail.com [mailto:mbennett.ideaeng@gmail.com] Im Auftrag von Mark Bennett
Gesendet: Donnerstag, 29. April 2010 16:59
An: java-user@lucene.apache.org
Betreff: Re: Relevancy Practices

Hi Grant,

You're welcome to use any of my slides (Dave's got them), with attribution
of course.

BUT....

Have you considered a section something like "why the hell do you think
Relevancy tweaking is gonna save you!?!?"

Basically that, as a corpus grows exponentially, so do results list sizes,
so ALL relevancy tweaks will eventually fail.  And FACETS (or other
navigators) are the answer.  I've got slides on that as well.

Of course relevancy matters.... but it's only ONE of perhaps a three pronged
approach:
1: Organic Relevancy and top query suggetions
2: Results list Navigators, the best the system can support, and
3: Data quality (spidering, METADATA quality, source weighting, etc)

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <gsingers@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message