lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dag Lem <...@nimrod.no>
Subject Re: [lucy-user] ClusterSearcher - prefetching hits
Date Thu, 25 Oct 2012 15:14:53 GMT
Marvin Humphrey <marvin@rectangular.com> writes:

[...]

> We could modify Hits by giving it a `prefetch_count` member variable and a
> `set_prefetch_count()` method.  The default for `prefetch_count` would be 0,
> preserving the current behavior, but ClusterSearcher could set that count
> before returning the Hits object so that all documents are prefetched by
> default on the first call to `next()`.  The result will be to cut down fetches
> from one round-trip per hit to one round-trip per shard-with-hits.

This will undoubtedly be a big win!

> There's no need to make `set_prefetch_count()` public yet -- it can remain
> an implementation detail for the time being.
> 
> The question of what to do about `fetch_doc_vec()` is harder.  Highlighter is
> the only place that calls `fetch_doc_vec()`, but it can't prefetch because it
> only deals with one hit at a time.
> 
> Perhaps we ought to explore integrating Highlighter with Hits instead of
> limiting it to dealing with individual Doc objects.  That way, Hits could
> assume responsibility for prefetching both Doc and DocVector objects at the
> same time.

This sounds very reasonable to me. To avoid unnecessary fetches for
applications without the need for highlighting, you could conceivably
control whether highlights should be prefetched by adding a second
member variable to Hits, e.g. 'prefetch_highlights'.

Stealing from the documentation of Lucy::Highlight::Highlighter,
perhaps you'd end up with something like the example below? This is
assuming that you want to give the user control over the process,
while also allowing him to shoot himself in the foot, of course :-)

my $highlighter = Lucy::Highlight::Highlighter->new(
        searcher => $searcher,
        query    => $query,
        field    => 'body'
    );
    my $hits = $searcher->hits( query               => $query,
                                prefetch_count      => 100,
                                prefetch_highlights => 1 );
    while ( my $hit = $hits->next ) {
        my $excerpt = $highlighter->create_excerpt($hit);
        ...
    }

I think the defaults you outlined for prefetch_count are sound. The
default for prefetch_highlights should probably be 1 for Hits returned
from ClusterSearcher (a user would have to set prefetch_highlights to
0 in order to squeeze the last bit of performance out of an
application without highlighting, but on the other hand he wouldn't
inadvertedly end up with gazillions of network roundtrips in an
applications which does use highlighting).

-- 
Best regards,

Dag Lem

Mime
View raw message