lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bethard <beth...@stanford.edu>
Subject Re: exponential boosts
Date Thu, 23 Apr 2009 21:28:15 GMT
On 4/23/2009 2:08 PM, Marcus Herou wrote:
> But perhaps one could use a FieldCache somehow ?

Some code snippets that may help. I add the PageRank value as a field of
the documents I index with Lucene like this:

    Document document = new Document();
    double pageRank = this.pageRanks.getCount(article.getId());
    document.add(new Field(
        PAGE_RANK_FIELD_NAME, Float.toString((float)pageRank),
        Field.Store.YES, Field.Index.NOT_ANALYZED));

Then when I want to get the value back, I use a FieldScoreQuery, which
just returns the field value as the document score, like this:

  new FieldScoreQuery(PAGE_RANK_FIELD_NAME, FieldScoreQuery.Type.FLOAT);

If you want to combine the PageRank score with another Query score, then
you can look at CustomScoreQuery to do so.

Steve

> On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou
> <marcus.herou@tailsweep.com>wrote:
> 
>> Yes I have considered it for 30 minutes :)
>>
>> How do one apply that in the real world ?
>>
>> If the only thing I get access to is the actual docId would it not be
>> really expensive to get the Document itself from the index and later use
>> some field in it as external lookup in some optimized structure for this ?
>>
>> Example, pseudo:
>>
>> *public* *float* customScore(*int* doc, *float* subQueryScore, *float* valSrcScore)
>>
>> {
>>         *Document document = indexSearcher.doc(doc);
>>         float score = MyOptimalHashStructure.getScore(document.get("someId"));
>>         return score**subQueryScore*;*
>>
>> }
>>
>> This would not scale well right ? I mean gathering scores through 100M docs
>> would take some time I guess ? Or even 1M docs...
>>
>> Please push me in the right direction.
>>
>> Cheers
>>
>> //Marcus
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 23, 2009 at 10:58 PM, Doron Cohen <cdoronc@gmail.com> wrote:
>>
>>>> I think we are doing similar things, at least I am trying to implement
>>>> document boosting with pagerank. Having issues of howto appky the
>>> scoring
>>>> of
>>>> specific docs without actually reindex them. I feel something should be
>>>> done
>>>> at query time which looks at external data but do not know howto
>>> implement
>>>> that. Do you ?
>>>>
>>> Have you considered CustomScoreQuery in o.a.l.search.function ? It should
>>> allow
>>> incorporating external scores.
>>>
>>> Doron
>>>
>>
>>
>> --
>> Marcus Herou CTO and co-founder Tailsweep AB
>> +46702561312
>> marcus.herou@tailsweep.com
>> http://www.tailsweep.com/
>> http://blogg.tailsweep.com/
>>
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message