incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Promoting new analysis components
Date Fri, 10 Feb 2012 00:40:54 GMT
On Thu, Feb 09, 2012 at 09:09:08PM +0100, Nick Wellnhofer wrote:
> On 09/02/2012 18:49, Marvin Humphrey wrote:
>> Rehashing our short exchange on IRC for the benefit of the list... I suggest
>> using PolyReader#open for this, since it returns NULL rather than throwing an
>> exception on failure.  If open() is successful, the resulting reader can be
>> used as an argument to IndexSearcher#new:
>
> Actually, PolyReader#open doesn't return NULL if the index is empty.
>
> But the list of seg_readers will be empty, so we can test for that.

Ah, sorry for misremembering.  

That behavior seems sub-optimal in retrospect.

Nevertheless, your revised approach will work.

> See the attached patch for my second attempt.

There is still potential for a schema conflict here.  We have to get
$self->{type} out of the existing schema when the index exists.

    if ( !@{ $reader->seg_readers } ) {
        # index is empty, create new schema
        $self->{schema} = Lucy::Plan::Schema->new;
    }
    else {
        # get schema from reader
        my $schema = $self->{schema} = $reader->get_schema;
        my ($field) = @{ $schema->get_fields };
        $self->{type} = $schema->fetch_type($field);
    }

    # Create a new FieldType if we haven't discovered one yet.
    if ( !$self->{type} ) {
        my $analyzer = Lucy::Analysis::EasyAnalyzer->new(
            language => $language );
        $self->{type} = Lucy::Plan::FullTextType->new(
            analyzer => $analyzer, );
    }

Aside from that, the patch looked good to me.

> Nick

> diff --git a/perl/lib/Lucy/Simple.pm b/perl/lib/Lucy/Simple.pm
> index aeb92c4..f09301b 100644
> --- a/perl/lib/Lucy/Simple.pm
> +++ b/perl/lib/Lucy/Simple.pm
> @@ -50,7 +50,6 @@ sub new {
>      # Get type and schema.
>      my $analyzer = Lucy::Analysis::EasyAnalyzer->new( language => $language );
>      $self->{type} = Lucy::Plan::FullTextType->new( analyzer => $analyzer, );
> -    my $schema = $self->{schema} = Lucy::Plan::Schema->new;
>  
>      # Cache the object for later clean-up.
>      weaken( $obj_cache{ refaddr $self } = $self );
> @@ -61,6 +60,15 @@ sub new {
>  sub _lazily_create_indexer {
>      my $self = shift;
>      if ( !defined $self->{indexer} ) {
> +        my $reader = Lucy::Index::PolyReader->open( index => $self->{path}
);
> +        if ( ! @{ $reader->seg_readers } ) {
> +            # index is empty, create new schema
> +            $self->{schema} = Lucy::Plan::Schema->new;
> +        }
> +        else {
> +            # get schema from reader
> +            $self->{schema} = $reader->get_schema;
> +        }
>          $self->{indexer} = Lucy::Index::Indexer->new(
>              schema => $self->{schema},
>              index  => $self->{path},
> @@ -70,11 +78,11 @@ sub _lazily_create_indexer {
>  
>  sub add_doc {
>      my ( $self, $hashref ) = @_;
> -    my $schema = $self->{schema};
>      my $type   = $self->{type};
>      croak("add_doc requires exactly one argument: a hashref")
>          unless ( @_ == 2 and reftype($hashref) eq 'HASH' );
>      $self->_lazily_create_indexer;
> +    my $schema = $self->{schema};
>      $schema->spec_field( name => $_, type => $type ) for keys %$hashref;
>      $self->{indexer}->add_doc($hashref);
>  }


Mime
View raw message