lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@BroadVision.com>
Subject RE: Boost more recent document
Date Thu, 01 Dec 2011 19:30:18 GMT
Hi Simon,

1) Thanks for suggesting lucene 4.0 feature, we will make use of it as soon as 
   we upgrade lucene.

2) Currently we recreate IndexSearcher for each query, which means recreate 
   underlying IndexReader for each query (I should have said IndexReader), but 
   sort performance is OK, so I would like to try CustomScoreQuery without cache 
   first?

Thanks very much for helps, Lisheng

-----Original Message-----
From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
Sent: Thursday, December 01, 2011 11:21 AM
To: Zhang, Lisheng
Cc: java-user@lucene.apache.org
Subject: Re: Boost more recent document


On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng
<Lisheng.Zhang@broadvision.com> wrote:
> Hi Simon,
>
> Sorry I found that I cannot use payload for this purpose because payload
> can be accessed only through term positions but we did not use timestamp
> for query. Ideally it would be great if we can have some doc-level "payload"
> accessible through docId?

lucene 4 has a feature called IndexDocValues which is essentially a
payload per document per field.

you can read about it here:
http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications
>
> Then your initial suggestion to use CustomScoreQuery would be our solution,
> from source code I see sort is implemented by FieldCache and its performance
> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery
> without cache for now (cutting time stamp to hour or day may help), if too
> slow we may consider selected cache.

what do you mean by cache readers?

simon
>
> Thanks very much for all your great helps, please point out if you see wrong
> in above statements?
>
> Best regards, Lisheng
>
> -----Original Message-----
> From: Zhang, Lisheng [mailto:Lisheng.Zhang@BroadVision.com]
> Sent: Wednesday, November 30, 2011 1:40 PM
> To: java-user@lucene.apache.org; simon.willnauer@gmail.com
> Subject: RE: Boost more recent document
>
>
> Hi,
>
> Thanks for the very interesting idea!
>
> Currently we use lucene 2.3.2 and we just use default merge policy (at
> any time we have a few segments and after some accumulation small segments
> are merged into big ones). I need to double check if docId can reflect doc
> age.
>
> But I have one concern: docId may not reflect true age interval, like docId
> difference by 2 may reflect 2m or 1h. If no better choice I may just use
> payload and adapt a few query classes?
>
> Thanks very much for helps, Lisheng
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Wednesday, November 30, 2011 1:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: Boost more recent document
>
>
> If you use LogMergePolicy ie. do merges in order you could use the
> absolute docID as a relative age value. Smaller docIDs mean younger
> documents. Maybe this works for you?
>
> simon
>
> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng
> <Lisheng.Zhang@broadvision.com> wrote:
>> Thanks very much for your helps! I got the point, only problem is that
>> I cannot afford to to use FieldCache because in our app we have many
>> lucene index data folders, is there another simple way?
>>
>> Thanks again, Lisheng
>>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>> Sent: Wednesday, November 30, 2011 11:40 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Boost more recent document
>>
>>
>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng
>> <Lisheng.Zhang@broadvision.com> wrote:
>>> Hi,
>>>
>>> We need to boost document which is more recent (each doc has time stamp attribute).
It seems that
>>> we cannot use doc boost at index time because it will be condensed into one byte
(cannot differentiate
>>> 365 days), so we may use payload (save time stamp as payload) to boost at search
time.
>>>
>>> In our app we let user enter query at browser and use QueryParser to generate
query, the query can
>>> be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it seems
we need to create
>>> each customized query class similar to PayloadTermQuery, is there another simpler
way?
>>
>> you can simply index your timestamp (untokenzied) and wrap your query
>> in a CustomScoreQuery. This query accepts your user query and a
>> ValueSource. During search CustomScoreQuery calls your valuesource for
>> each document that the user query scores and multiplies the result of
>> the ValueSource into the score. Inside your valuesource you can simply
>> get the timestamps from the FieldCache and calculate your custom
>> boost...
>>
>> hope that helps
>>
>> simon
>>>
>>> Thanks very much for helps, Lisheng
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Mime
View raw message