lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Newton" <glen.new...@gmail.com>
Subject Re: How to add PageRank score with lucene's relevant score in sorting
Date Wed, 28 May 2008 14:51:33 GMT
You should consider keeping the PageRank (and any other more dynamic
data) in a separate index (with the documents in the same oder as your
bigger, more static index) and then use a ParallelReader on both of
them. See:
  http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html

-Glen

2008/5/28 过佳 <nttstar@gmail.com>:
> I think this is not suitable for my system since the num of pages is very
> large that will cost much time for reindex
>
> 2008/5/28, Ian Lea <ian.lea@gmail.com>:
>>
>> Yes.  But you'd have to do that anyway if you are storing pagerank in the
>> index.
>>
>> One point on your 20s response time for sorting - is that for the
>> first sort or subsequent ones?
>> I believe that the first one will usually be substantially slower.
>> But sorting is always likely to be slower than not sorting.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, May 28, 2008 at 12:47 PM, 过佳 <nttstar@gmail.com> wrote:
>> > thanks lan, but this means that i must reindex these pages while the
>> > pagerank score changed?
>> >
>> > 在08-5-28,Ian Lea <ian.lea@gmail.com> 写道:
>> >>
>> >> Hi
>> >>
>> >>
>> >> Maybe you could use the pagerank score, possibly modified, as document
>> >> boost at indexing time.  From the javadocs for
>> >> Document.setBoost(boost)
>> >>
>> >> "Sets a boost factor for hits on any field of this document. This
>> >> value will be multiplied into the score of all hits on this document"
>> >>
>> >> so will give you P * R rather than P + R.  Should be quick, though.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <nttstar@gmail.com> wrote:
>> >> > hi all ,
>> >> >     I have a problem that how to "combine" two score to sort the
>> search
>> >> > result documents.
>> >> >     for example I  have 10 million pages in lucene index , and i know
>> >> their
>> >> > pagerank scores. i give a query to it , every docs returned have a
>> >> > lucene-score, mark it as R (relevant score), and  i  also  have its
>> >> > pagerank score, mark it as P,  what i need is i want to sort the
>> search
>> >> > result base on the value "P+R".  You know if i store the pagerank
>> score
>> >> in
>> >> > index and get it every search time , then compute P+R , then sort it
,
>> >> this
>> >> > way is too slow. in my system , when the search hits 500000 result
,
>> the
>> >> > sort may cost about 20s.
>> >> >   Sorry for my poor english.  Anyone has a good idea?
>> >> >
>> >> > Best
>> >> > Jarvis
>> >> >
>> >>
>> >
>>
>



-- 

-
Mime
View raw message