lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason rutherglen <jasonhus...@yahoo.com>
Subject Re: Making RemoteSearchable like client for Solr
Date Fri, 19 May 2006 01:21:05 GMT
It uses Jakarta HTTP Client.  And implements a PriorityQueue like thing using the java.util.concurrent
queues and thread pool for merging results.  Perhaps the global IDF is not a big deal?  The
idea is to distribute evenly over all the machines the documents.  However when a new server
comes online, this may present a problem as it would start at 0 documents.  The goal would
be to allow scaling by simply adding hardware and the software takes care of the rest.  

I probably would not cache the global IDF, would simply merge it each time.  I actually do
not fully understand what the global IDF means as I need to dig more deeply into this.  

> I don't think everything can be done in a single call since by the
time you score docs against a query you have lost how you arrived at
the composite score.

I'm not sure what this means "you have lost how you arrived at
 the composite score" could you explain.  

Anyways, thanks for doing Solr it's quite cool, it has been working quite well.  

Jason

----- Original Message ----
From: Yonik Seeley <yseeley@gmail.com>
To: solr-dev@lucene.apache.org
Sent: Thursday, May 18, 2006 6:04:50 PM
Subject: Re: Making RemoteSearchable like client for Solr

On 5/18/06, jason rutherglen <jasonhusong@yahoo.com> wrote:
> I used the XML, I think using HTTP is important.

Is this written in Java?  Using HTTPClient?  Anything you will be able to share?

No caching on the client yet, that is a good idea, however my personal
goal is to have an index that is updated every 30 seconds or less and
so am not sure about caching on the client.  The caching can be
handled by the Solr servers, that should be fine.  If it works
correctly then the architecture is very simple requiring 2 layers.
The first is a Solr layer, the second is the client layer essentially
running many threads in parallel per request.  Seems like this would
scale cheaply by adding more hardware on both layers.
>
> >  If you are using RMI you could
> either borrow from or subclass Lucene's MultiSearcher that implements
> this stuff.
>
> Yeah this is the real issue, if there are any general outlines of the best way to do
this with Solr.  Perhaps a separate Solr call for the docFreqs?  Or could this be returned
in the current /select call?  I'm still trying to figure this part out.

Using XML, there would definitely have to be some more API calls to
return idf related stuff.
I don't think everything can be done in a single call since by the
time you score docs against a query you have lost how you arrived at
the composite score.

It might be nice to be able to turn the distributed idf turned off
though... people with large index segments and documents that are
randomly distributed probably won't see much of a difference in
scoring, but will see a performance increase.

We also need to be careful of caching scores at the local level... if
a different remote searcher changes, the scores cached on the other
become invalid because of the gobal idf (yuck).

-Yonik




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message