lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: How to add PageRank score with lucene's relevant score in sorting
Date Thu, 29 May 2008 14:44:16 GMT
Hello,

little off topic, but how did you obtain the pagerank for each page. did you
calculate it, or have you obtained it with some other way while getting a
specific site.

Best.

On Thu, May 29, 2008 at 3:28 PM, 过佳 <nttstar@gmail.com> wrote:

> thanks Glen , we have tried it , but the bottleneck is to get the document
> (indexReader.document(num)), so it is not efficient enough .
>
> 2008/5/28, Glen Newton <glen.newton@gmail.com>:
> >
> > You should consider keeping the PageRank (and any other more dynamic
> > data) in a separate index (with the documents in the same oder as your
> > bigger, more static index) and then use a ParallelReader on both of
> > them. See:
> >
> >
> http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html
> >
> > -Glen
> >
> > 2008/5/28 过佳 <nttstar@gmail.com>:
> > > I think this is not suitable for my system since the num of pages is
> very
> > > large that will cost much time for reindex
> > >
> > > 2008/5/28, Ian Lea <ian.lea@gmail.com>:
> > >>
> > >> Yes.  But you'd have to do that anyway if you are storing pagerank in
> > the
> > >> index.
> > >>
> > >> One point on your 20s response time for sorting - is that for the
> > >> first sort or subsequent ones?
> > >> I believe that the first one will usually be substantially slower.
> > >> But sorting is always likely to be slower than not sorting.
> > >>
> > >>
> > >> --
> > >> Ian.
> > >>
> > >>
> > >> On Wed, May 28, 2008 at 12:47 PM, 过佳 <nttstar@gmail.com> wrote:
> > >> > thanks lan, but this means that i must reindex these pages while the
> > >> > pagerank score changed?
> > >> >
> > >> > 在08-5-28,Ian Lea <ian.lea@gmail.com> 写道:
> > >> >>
> > >> >> Hi
> > >> >>
> > >> >>
> > >> >> Maybe you could use the pagerank score, possibly modified, as
> > document
> > >> >> boost at indexing time.  From the javadocs for
> > >> >> Document.setBoost(boost)
> > >> >>
> > >> >> "Sets a boost factor for hits on any field of this document. This
> > >> >> value will be multiplied into the score of all hits on this
> document"
> > >> >>
> > >> >> so will give you P * R rather than P + R.  Should be quick, though.
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Ian.
> > >> >>
> > >> >>
> > >> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <nttstar@gmail.com>
wrote:
> > >> >> > hi all ,
> > >> >> >     I have a problem that how to "combine" two score to sort
the
> > >> search
> > >> >> > result documents.
> > >> >> >     for example I  have 10 million pages in lucene index
, and i
> > know
> > >> >> their
> > >> >> > pagerank scores. i give a query to it , every docs returned
have
> a
> > >> >> > lucene-score, mark it as R (relevant score), and  i  also
 have
> its
> > >> >> > pagerank score, mark it as P,  what i need is i want to sort
the
> > >> search
> > >> >> > result base on the value "P+R".  You know if i store the
pagerank
> > >> score
> > >> >> in
> > >> >> > index and get it every search time , then compute P+R , then
sort
> > it ,
> > >> >> this
> > >> >> > way is too slow. in my system , when the search hits 500000
> result
> > ,
> > >> the
> > >> >> > sort may cost about 20s.
> > >> >> >   Sorry for my poor english.  Anyone has a good idea?
> > >> >> >
> > >> >> > Best
> > >> >> > Jarvis
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
> >
> >
> > --
> >
> > -
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message