lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-user] Doc id from hits and remove redundant documents
Date Wed, 23 Nov 2016 15:11:39 GMT
On 23/11/2016 15:33, Gupta, Rajiv wrote:
> 1.       Which field I use to get the document id from hits:
>   my $hits = $searcher->hits(
>       query      => $query_parsed,
>       num_wanted => -1, # -1 equivlent to all results
> );
> while (my $hits $hits->next()){
>                 print "Docment id: " . $hit->{???};
> }

$hits->next() returns an arrayref of Lucy::Document::HitDocs:

     http://lucy.apache.org/docs/perl/Lucy/Document/HitDoc.html

HitDoc inherits from Lucy::Document::Doc which has a get_doc_id method:

     http://lucy.apache.org/docs/perl/Lucy/Document/Doc.html#get_doc_id

So you can get the doc ID with:

     my $doc_id = $hit->get_doc_id();

> 2.       While inserting records how can avoid inserting duplicate records.

You have to delete the old documents, using one of the delete_* methods in 
Lucy::Index::Indexer:

     http://lucy.apache.org/docs/perl/Lucy/Index/Indexer.html

Typically, you use one of the fields in your schema as primary key and delete 
documents using delete_by_term:

     $indexer->delete_by_term(my_primary_key => $value);

Nick


Mime
View raw message