lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-user] Unable to retrieve records using Proximity query
Date Sat, 23 Jun 2012 05:02:40 GMT
Saurabh Vasekar wrote on 6/21/12 3:01 PM:

> 
> For queries like  e.g. "content:jakarta AND content:apache" or e.g
> "+content:apache AND -content:retrieval"
> I compared the search the results with other indexing libraries viz.  Ferret,
> Lucene etc and they gave the same results. 
> 
> But for query "content:\"jakarta apache\"~4 results shown by Lucene and Ferret
> are accurate but I am not getting any record with Lucy.
> 

Thanks for the full code examples. They were helpful.

It took me a few hours of playing with it to figure out why it wasn't working as
you (and I) expected. Your indexing code is fine. The searching code assumes (as
did I at first) that the terms in a ProximityQuery object would be analyzed
(stemmed). They aren't. Only the QueryParser does the analyzing. When you
construct a Query object manually, you have to the analysis yourself.

Unfortunately, the core Lucy::Search::QueryParser class doesn't handle the
proximity syntax, since ProximityQuery is an extension to the core.

Fortunately, Search::Query::Parser handles more advanced query syntax than does
the core class. (This is no knock against the Lucy parser -- as Marvin and I
have discussed in the past, it is a thankless task to try and create a parser
that is all things to all people.)

I've included example searcher code below. I've included examples of using a
query parser vs just constructing the query objects manually.

use strict;
use warnings;

my $path_to_index = 'lucy_store';

use Lucy::Search::QueryParser;
use Lucy::Search::IndexSearcher;
use LucyX::Search::ProximityQuery;
use Search::Query;

my $searcher = Lucy::Search::IndexSearcher->new( index => $path_to_index, );

TERM: {
    my $term_query = Lucy::Search::TermQuery->new(
        field => 'content',
        term  => 'apache',
    );
    my $hits = $searcher->hits( query => $term_query, );

    my $hit_count = $hits->total_hits;

    while ( my $hit = $hits->next ) {
        my $content = $hit->{content};

        print("Content : $content\n");

        print("\n");
    }

    printf( "TERM Hit Count :$hit_count for query %s\n",
        $term_query->to_string );

}

TERMPARSED: {
    my $qp = Lucy::Search::QueryParser->new(
        schema => $searcher->get_schema,
        fields => [qw( content )],
    );
    my $term_query = $qp->parse('apache');
    my $hits = $searcher->hits( query => $term_query, );

    my $hit_count = $hits->total_hits;

    while ( my $hit = $hits->next ) {
        my $content = $hit->{content};

        print("Content : $content\n");

        print("\n");
    }

    printf( "TERMPARSED Hit Count :$hit_count for query %s\n",
        $term_query->to_string );

}

PROX: {
    my $proximity_query = LucyX::Search::ProximityQuery->new(
        field  => 'content',
        terms  => [qw( apache jakarta )],
        within => 4,
    );
    my $hits = $searcher->hits( query => $proximity_query );

    my $hit_count = $hits->total_hits;

    while ( my $hit = $hits->next ) {
        my $content = $hit->{content};

        print("Content : $content\n");

        print("\n");
    }

    printf( "PROX Hit Count :$hit_count for query %s\n",
        $proximity_query->to_string );

}

PROXSQP: {
    my $schema      = $searcher->get_schema();
    my $field_names = $schema->all_fields;
    my %fieldtypes;
    for my $name (@$field_names) {
        $fieldtypes{$name} = {
            type     => $schema->fetch_type($name),
            analyzer => $schema->fetch_analyzer($name)
        };
    }

    my $qp = Search::Query::Parser->new(
        dialect      => 'Lucy',
        fields       => \%fieldtypes,
        dialect_opts => { default_field => 'content' },  # just for example
    );

    my $proximity_query
        = $qp->parse('content:"apache jakarta"~4')->as_lucy_query;

    my $hits = $searcher->hits( query => $proximity_query );

    my $hit_count = $hits->total_hits;

    while ( my $hit = $hits->next ) {
        my $content = $hit->{content};

        print("Content : $content\n");

        print("\n");
    }

    printf( "PROXSQP Hit Count :$hit_count for query %s\n",
        $proximity_query->to_string );

}



-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message