lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <jdelg...@lendingclub.com>
Subject Re: Progressive Query Relaxation
Date Tue, 10 Apr 2007 17:38:40 GMT
See my comments below.

2007/4/10, Walter Underwood <wunderwood@netflix.com>:
> On 4/10/07 10:06 AM, "J. Delgado" <jdelgado@lendingclub.com> wrote:
>
> > Progressive relaxation, at least as Oracle has defined it, is a
> > flexible, developer defined series of queries that are efficiently
> > executed in progression and in one trip to the engine, until minimum
> > of hits required is satisfied. It is not a self adapting precision
> > scheme nor it tries to guess what is the best match.
>
> Correct. Search engines are all about the best match. Why would
> you show anything else?

Agreed, but best match is not ONLY about keywords. Here is where the
system developer can provide extra intelligence by doing query
re-writing.

>
> This is an RDBMS flavored approach, not an approach that considers
> natural language text.

Why do you say this? The rank is still provided by the search engine
BASED ON THE QUERY submitted and it does consider natural language
text. It's just leaving the order of execution in the hands of the
developer who knows better what the system should return for some
specific cases.

> Sets of matches, not a ranked list. It fails
> as soon as one of the sets gets too big, like when someone searches
> for "laserjet" at HP.com. That happens a lot.

Nope...we are talking about the same thing: a ranked list, and all the
other cool stuff regarding automatic query expansion, hit list
clustering/faceted search, etc have solve the "laserjet" problem you
mentioned above.

>
> It assumes that all keywords are the same, something that Gerry
> Salton figured out was false thirty years ago. That is why we
> use tf.idf instead of sets of matches.

I'm totally with you. Oracle Text uses TF.IDF as well :-)

>
> I see a lot of design without any talk about what problem they are
> solving. What queries don't work? How do we make those better?
> Let's work from real logs and real data. Oracle's hack doesn't
> solve any problem I've see in real query logs.
>

I think you have something personal against Oracle... Hey I have no
interest in defending Oracle, but this no hack. It has its place for
certain applications. I'm not in favor on using Oracle Text, all I
asked was if this feature was available in Solr/Lucene because I think
it would be useful.

> I'm doing e-commerce search, and our current engine does pretty
> much what Oracle is offering. The results are not good, and we
> are replacing it with Solr and DisMax. My off-line relevance testing
> shows a big improvement.

Yep. One thing we agree on (that Netflix's engine's result is not
good). In any case, I think moving to Sorl and DisMax is a great idea
and should improve relvance. I also think that in some cases having
control of the queries that are expanded and executing them
progressively is the right way to go. For example , Nutch implements a
pretty sofisticated query rewrite in hopes of improving the relevance
ranking for their users. I think the results can be computed more
efficently if they whole query does not need to be evaluated, but just
enough of it that will return the required number of results.

Joaquin Delgado, PhD

>
> wunder
> --
> Search Guru, Netflix
>
>
>

Mime
View raw message