lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Re: exponential boosts
Date Thu, 23 Apr 2009 21:47:26 GMT
Never mind of how to open the ParallellReader stuff (I am an idiot): RTFM:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/ParallelReader.html

But the rest is of course interesting :)

/M

On Thu, Apr 23, 2009 at 11:42 PM, Marcus Herou
<marcus.herou@tailsweep.com>wrote:

> Thanks! (I started my reply and then saw that you added code snippets)
>
> I think we are narrowing down the problem to the updating issue of the
> PageRank score.
>
> So what you basically are saying is that:
>
> 1. You have an index which contains data that is more or less static (no
> updates) or you have another update interval than the PR interval.
> 2. A PR index which is rebuilt (from scratch ?) every X days/weeks/months.
>
> Commenting inline.
>
> Cheers and thanks for everything.
>
> //Marcus
>
> On Thu, Apr 23, 2009 at 11:28 PM, Steven Bethard <bethard@stanford.edu>wrote:
>
>> On 4/23/2009 2:08 PM, Marcus Herou wrote:
>> > But perhaps one could use a FieldCache somehow ?
>>
>> Some code snippets that may help. I add the PageRank value as a field of
>> the documents I index with Lucene like this:
>>
>>    Document document = new Document();
>>    double pageRank = this.pageRanks.getCount(article.getId());
>>    document.add(new Field(
>>        PAGE_RANK_FIELD_NAME, Float.toString((float)pageRank),
>>        Field.Store.YES, Field.Index.NOT_ANALYZED));
>
>
> Just as I do but I store the score/PageRank in the "main" index which is a
> pain in the ass since I want to be able to update the score field quite
> frequently at least every week.
>
>>
>>
>> Then when I want to get the value back, I use a FieldScoreQuery, which
>> just returns the field value as the document score, like this:
>>
>>  new FieldScoreQuery(PAGE_RANK_FIELD_NAME, FieldScoreQuery.Type.FLOAT);
>
>
> Great. Never seen that query type before. Will it use that fields score
> solely as the base for rank ? Can it be combinable with other Query types or
> am I missing something ?
> What if I instead sorted on that column ? which is most efficient ?
> Example: searcher.search(q, null, 100000, new Sort("score", true));
> Then we get to the issue of how to open two indexes and use one
> IndexSearcher across them and perhaps a MultiFieldQueryParser to produce the
> Query.
> Pointers ?
>
>
>>
>>
>> If you want to combine the PageRank score with another Query score, then
>> you can look at CustomScoreQuery to do so.
>
> You mean by perhaps combining it with user supplied boost like field:term^4
> ?
> Will the score passed in be the score of the FieldScoreQuery if used or the
> built in one ?
>
>
>>
>>
>> Steve
>>
>> > On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou
>> > <marcus.herou@tailsweep.com>wrote:
>> >
>> >> Yes I have considered it for 30 minutes :)
>> >>
>> >> How do one apply that in the real world ?
>> >>
>> >> If the only thing I get access to is the actual docId would it not be
>> >> really expensive to get the Document itself from the index and later
>> use
>> >> some field in it as external lookup in some optimized structure for
>> this ?
>> >>
>> >> Example, pseudo:
>> >>
>> >> *public* *float* customScore(*int* doc, *float* subQueryScore, *float*
>> valSrcScore)
>> >>
>> >> {
>> >>         *Document document = indexSearcher.doc(doc);
>> >>         float score =
>> MyOptimalHashStructure.getScore(document.get("someId"));
>> >>         return score**subQueryScore*;*
>> >>
>> >> }
>> >>
>> >> This would not scale well right ? I mean gathering scores through 100M
>> docs
>> >> would take some time I guess ? Or even 1M docs...
>> >>
>> >> Please push me in the right direction.
>> >>
>> >> Cheers
>> >>
>> >> //Marcus
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 23, 2009 at 10:58 PM, Doron Cohen <cdoronc@gmail.com>
>> wrote:
>> >>
>> >>>> I think we are doing similar things, at least I am trying to
>> implement
>> >>>> document boosting with pagerank. Having issues of howto appky the
>> >>> scoring
>> >>>> of
>> >>>> specific docs without actually reindex them. I feel something should
>> be
>> >>>> done
>> >>>> at query time which looks at external data but do not know howto
>> >>> implement
>> >>>> that. Do you ?
>> >>>>
>> >>> Have you considered CustomScoreQuery in o.a.l.search.function ? It
>> should
>> >>> allow
>> >>> incorporating external scores.
>> >>>
>> >>> Doron
>> >>>
>> >>
>> >>
>> >> --
>> >> Marcus Herou CTO and co-founder Tailsweep AB
>> >> +46702561312
>> >> marcus.herou@tailsweep.com
>> >> http://www.tailsweep.com/
>> >> http://blogg.tailsweep.com/
>> >>
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message