incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lance bowler <lanceb...@gmail.com>
Subject Re: [lucy-user] Collapsing and highlighting
Date Sun, 11 Dec 2011 04:24:53 GMT
On Sun, Dec 11, 2011 at 4:39 AM, Peter Karman <peter@peknet.com> wrote:
> > - I want to do something similar to Google's Cached page:  ie, display a
> > website page from my store with the search terms highlighted (ie, not using
> > Lucy's normal excerpt/highlighter since the page is not coming from the
> > index itself, but from cached pages).  Lucy's highlighter does a great job
> > of highlighting words based on stemming, can this somehow be hooked into to
> > display a page from disk?
>
> Check out HTML::HiLiter and/or Search::Tools::HiLiter (which HTML::HiLiter uses
> underneath). I wrote that explicitly for the purpose you're describing.

great thanks, I'll check it out.

>
> >
> > - I read somewhere on the list about concurrency maybe being a problem (ie,
> > remote cluster searchers can only perform 1 search at a time).  surely if
> > you have 10 or more visitors performing parallel searches they'll block
> > until each one completes...?
>
> If I'm understanding your concern correctly, I don't think concurrency as you're
> describing it is a problem. Multiple Searchers can be open against the same
> index simultaneously, each handling parallel searches. The issue you might be
> referring to with the now-deprecated LucyX::Remote::Search(Server|Client) was
> that requests to the server were being executed serially rather than in
> parallel. That issue should be addressed now in trunk with the new ClusterSearcher.

I had a gander at the ClusterSearcher code and it indeed only accepts
1 search request at a time - unless my perl grokness is failing me.

At a basic level, a server should accept inbound connections and
fork/thread off and handle each request concurrently (a-la xinetd,
etc).  Even if each request takes only 0.2s to complete, 10 such
requests (or 50...) would rapidly push that number up to 2s (or
10s...)   -- or am I not missing the mark here?

We have spikes of traffic and concurrent searching would leave tens -
or hundreds - of users staring at a rotating wheel while the search
client waits it's turn...  bad

Mime
View raw message