lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Re: exponential boosts
Date Thu, 23 Apr 2009 21:42:35 GMT
Thanks! (I started my reply and then saw that you added code snippets)

I think we are narrowing down the problem to the updating issue of the
PageRank score.

So what you basically are saying is that:

1. You have an index which contains data that is more or less static (no
updates) or you have another update interval than the PR interval.
2. A PR index which is rebuilt (from scratch ?) every X days/weeks/months.

Commenting inline.

Cheers and thanks for everything.

//Marcus

On Thu, Apr 23, 2009 at 11:28 PM, Steven Bethard <bethard@stanford.edu>wrote:

> On 4/23/2009 2:08 PM, Marcus Herou wrote:
> > But perhaps one could use a FieldCache somehow ?
>
> Some code snippets that may help. I add the PageRank value as a field of
> the documents I index with Lucene like this:
>
>    Document document = new Document();
>    double pageRank = this.pageRanks.getCount(article.getId());
>    document.add(new Field(
>        PAGE_RANK_FIELD_NAME, Float.toString((float)pageRank),
>        Field.Store.YES, Field.Index.NOT_ANALYZED));


Just as I do but I store the score/PageRank in the "main" index which is a
pain in the ass since I want to be able to update the score field quite
frequently at least every week.

>
>
> Then when I want to get the value back, I use a FieldScoreQuery, which
> just returns the field value as the document score, like this:
>
>  new FieldScoreQuery(PAGE_RANK_FIELD_NAME, FieldScoreQuery.Type.FLOAT);


Great. Never seen that query type before. Will it use that fields score
solely as the base for rank ? Can it be combinable with other Query types or
am I missing something ?
What if I instead sorted on that column ? which is most efficient ?
Example: searcher.search(q, null, 100000, new Sort("score", true));
Then we get to the issue of how to open two indexes and use one
IndexSearcher across them and perhaps a MultiFieldQueryParser to produce the
Query.
Pointers ?


>
>
> If you want to combine the PageRank score with another Query score, then
> you can look at CustomScoreQuery to do so.

You mean by perhaps combining it with user supplied boost like field:term^4
?
Will the score passed in be the score of the FieldScoreQuery if used or the
built in one ?


>
>
> Steve
>
> > On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou
> > <marcus.herou@tailsweep.com>wrote:
> >
> >> Yes I have considered it for 30 minutes :)
> >>
> >> How do one apply that in the real world ?
> >>
> >> If the only thing I get access to is the actual docId would it not be
> >> really expensive to get the Document itself from the index and later use
> >> some field in it as external lookup in some optimized structure for this
> ?
> >>
> >> Example, pseudo:
> >>
> >> *public* *float* customScore(*int* doc, *float* subQueryScore, *float*
> valSrcScore)
> >>
> >> {
> >>         *Document document = indexSearcher.doc(doc);
> >>         float score =
> MyOptimalHashStructure.getScore(document.get("someId"));
> >>         return score**subQueryScore*;*
> >>
> >> }
> >>
> >> This would not scale well right ? I mean gathering scores through 100M
> docs
> >> would take some time I guess ? Or even 1M docs...
> >>
> >> Please push me in the right direction.
> >>
> >> Cheers
> >>
> >> //Marcus
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Apr 23, 2009 at 10:58 PM, Doron Cohen <cdoronc@gmail.com>
> wrote:
> >>
> >>>> I think we are doing similar things, at least I am trying to implement
> >>>> document boosting with pagerank. Having issues of howto appky the
> >>> scoring
> >>>> of
> >>>> specific docs without actually reindex them. I feel something should
> be
> >>>> done
> >>>> at query time which looks at external data but do not know howto
> >>> implement
> >>>> that. Do you ?
> >>>>
> >>> Have you considered CustomScoreQuery in o.a.l.search.function ? It
> should
> >>> allow
> >>> incorporating external scores.
> >>>
> >>> Doron
> >>>
> >>
> >>
> >> --
> >> Marcus Herou CTO and co-founder Tailsweep AB
> >> +46702561312
> >> marcus.herou@tailsweep.com
> >> http://www.tailsweep.com/
> >> http://blogg.tailsweep.com/
> >>
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message