lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: sorting by per doc hit count
Date Sat, 16 Dec 2006 21:20:29 GMT
Well, if you're not interested in doing much in the way of complex queries,
you could use TermDocs/TermEnum (particularly look at TermDocs) to count the
number of times a term appears in each document. I think you'll be surprised
at how quickly you can get this info.

Making your own scorer seems like a reasonable approach also, you could just
return the term frequency (see TermDocs) although I admit I'm staying away
from messing with the relevancy scorers so I"m not speaking from experience.
Others who know more are going to have to weigh in on functionquery.

Warning: I've just been in some (non Lucene) code that tried to do it's own
arbitrarily complex boolean logic by counting term frequency. Don't go there
if you want to keep your work minimal. In fact, I'd recommend against going
there at all <G>. If you can restrict the allowed syntax to be simple AND
you'd be all set (simple OR would be ok too). I suspect that as soon as you
start even combining the two allowing grouping, the effort increases
dramatically.

Which probably argues for making your own scorer that just deals with
frequency.

Best
Erick

On 12/16/06, Mark Miller < markrmiller@gmail.com> wrote:
>
> I have not really looked into this yet, but maybe you can save me some
> time
> -- Is it feasible/simple to sort by the number of hits found per document?
> Would this require changing the scoring system (remove idf etc etc) and
> doing a normal relevancy search? Could it be done with functionquery? Any
> Hints? If it is a lot of work I am not interested in doing it, but if it
> is
> somewhat simple it would make a few customers feel fuzzy.
>
> Thanks,
>
> Mark
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message