lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-user] Re: Regarding document Ids
Date Wed, 16 Nov 2016 19:25:54 GMT
Serkan Mulayim wrote on 11/16/16, 1:17 PM:
> Hi guys,
>
> I think I need to simplify my question. After reading it one more time, I
> realized I touched many things, and it seem confusing.
>
> It seems like if we index the same document twice, a new document is
> created. And as per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If
> you truly need a primary key field, you must define it and populate it
> yourself". How can we do this, are there any examples around this? Should I
> search for the document with the primary key before indexing and if it
> exists, should I not index it?

What I do in all my apps is use delete_by_term
https://metacpan.org/pod/distribution/Lucy/lib/Lucy/Index/Indexer.pod#delete_by_term

I have my own primary key system that varies based on the application. Sometimes 
it is a URI, sometimes a db PK. I maintain the document integrity myself.

One example from how Dezi solves this more generally:

https://metacpan.org/source/KARMAN/Dezi-App-0.014/lib/Dezi/Lucy/Indexer.pm#L451

Lucy isn't a RDBMS. It just tokenizes the fields you shove into it, and 
retrieves very quickly.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message