lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: MoreLikeThis Query generator - Re: code for "more like this" query "expansion" - was - Re: setMaxClauseCount ??
Date Wed, 18 Feb 2004 08:30:35 GMT
Doug Cutting wrote:

> David Spencer wrote:
>
>> Code rewritten, automagically chooses lots of defaults, lets you 
>> override
>> the defs thru the static vars at the bottom or the non-static vars 
>> also at the bottom.
>
>
> Has anyone used this?  Was it useful?

I've put it up on my "demo" site (rfc::search) in which I have a humble 
index of approx 3500 RFCs.

This is the site:

http://www.hostmon.com/rfc/index.jsp

A typical search takes you here:

http://www.hostmon.com/rfc/search.jsp?s=LDAP+Security&x=33&y=9



Then clicking on a match takes you to a link to view an RFC like this 
where things start to get interesting.

http://www.hostmon.com/rfc/get.jsp?id=1823&s=LDAP%20Security

There are 3 links of interest now at the top/middle of the page in the 
brownish background.

[a] "show similar" - forms a query from *all* words in the doc - no 
heuristics wrt idf(), etc.

[b] "more like this" - uses the MoreLikeThis code I wrote with the 
default settings.

[c] "interesting words" - uses code from MoreLikeThis to give a table of 
all interesting
words in the current "source" doc ordered by score.
Remember score is idf*tf as per Dougs mail (and as per my
hopefully correct understanding of these things). This page is of course 
more of a debugging
tool that something a normal user would see.  One possible area of 
improvement that jumped out at me after reviewing this table is using 
stemming, say, allowing more words in the generated query when 2 words 
have the same stem.

Note - [a] uses no code from [b] and [c]. It is just there for comparision.

> Should we add it to the sandbox?

I'd appreciate if someone could proofread MoreLikeThis.like(Reader) and 
mlt(Reader).

At a glance it seems to return reasonable results on my site.

-- Dave

>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message