lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Boost more recent document
Date Thu, 01 Dec 2011 19:20:47 GMT
On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
<Lisheng.Zhang@broadvision.com> wrote:
> Hi Simon,
>
> Sorry I found that I cannot use payload for this purpose because payload
> can be accessed only through term positions but we did not use timestamp
> for query. Ideally it would be great if we can have some doc-level "payload"
> accessible through docId?

lucene 4 has a feature called IndexDocValues which is essentially a
payload per document per field.

you can read about it here:
http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>
> Then your initial suggestion to use CustomScoreQuery would be our solution,
> from source code I see sort is implemented by FieldCache and its performance
> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
> without cache for now (cutting time stamp to hour or day may help), if too
> slow we may consider selected cache.

what do you mean by cache readers?

simon
>
> Thanks very much for all your great helps, please point out if you see wrong
> in above statements?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Zhang, Lisheng [mailto:Lisheng.Zhang@BroadVision.com]
> Sent: Wednesday, November 30, 2011 1:40 PM
> To: java-user@lucene.apache.org; simon.willnauer@gmail.com
> Subject: RE: Boost more recent document
>
>
> Hi,
>
> Thanks for the very interesting idea!
>
> Currently we use lucene 2.3.2 and we just use default merge policy (at
> any time we have a few segments and after some accumulation small segments
> are merged into big ones). I need to double check if docId can reflect doc
> age.
>
> But I have one concern: docId may not reflect true age interval, like docId
> difference by 2 may reflect 2m or 1h. If no better choice I may just use
> payload and adapt a few query classes?
>
> Thanks very much for helps, Lisheng
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Wednesday, November 30, 2011 1:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> If you use LogMergePolicy ie. do merges in order you could use the
> absolute docID as a relative age value. Smaller docIDs mean younger
> documents. Maybe this works for you?
>
> simon
>
> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
> <Lisheng.Zhang@broadvision.com> wrote:
>> Thanks very much for your helps! I got the point, only problem is that
>> I cannot afford to to use FieldCache because in our app we have many
>> lucene index data folders, is there another simple way?
>>
>> Thanks again, Lisheng
>>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Wednesday, November 30, 2011 11:40 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>> <Lisheng.Zhang@broadvision.com> wrote:
>>> Hi,
>>>
>>> We need to boost document which is more recent (each doc has time stamp attribute).
It seems that
>>> we cannot use doc boost at index time because it will be condensed into one byte
(cannot differentiate
>>> 365 days), so we may use payload (save time stamp as payload) to boost at search
time.
>>>
>>> In our app we let user enter query at browser and use QueryParser to generate
query, the query can
>>> be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it seems
we need to create
>>> each customized query class similar to PayloadTermQuery, is there another simpler
way?
>>
>> you can simply index your timestamp (untokenzied) and wrap your query
>> in a CustomScoreQuery. This query accepts your user query and a
>> ValueSource. During search CustomScoreQuery calls your valuesource for
>> each document that the user query scores and multiplies the result of
>> the ValueSource into the score. Inside your valuesource you can simply
>> get the timestamps from the FieldCache and calculate your custom
>> boost...
>>
>> hope that helps
>>
>> simon
>>>
>>> Thanks very much for helps, Lisheng
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message