lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serkan Mulayim <serkanmula...@gmail.com>
Subject [lucy-user] Re: Regarding document Ids
Date Wed, 16 Nov 2016 19:17:33 GMT
Hi guys,

I think I need to simplify my question. After reading it one more time, I
realized I touched many things, and it seem confusing.

It seems like if we index the same document twice, a new document is
created. And as per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If
you truly need a primary key field, you must define it and populate it
yourself". How can we do this, are there any examples around this? Should I
search for the document with the primary key before indexing and if it
exists, should I not index it?

Thanks,
Serkan

On Tue, Nov 15, 2016 at 2:22 PM, Serkan Mulayim <serkanmulayim@gmail.com>
wrote:

> Hi,
>
> As far as I see if we add the same document twice, it creates a new
> document. As per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If
> you truly need a primary key field, you must define it and populate it
> yourself". Can you please elaborate on this one? Does it mean choosing a
> field to be primary key and delete the document with the primary key and
> re-add it? If so the document might have not been created until we commit,
> so deletion would not be possible, right? Also performance would be another
> issue.
>
> Another solution might be hashing the "primary key" and put it as the
> documentId (but the referred page also says that docIds are ephemeral). If
> the ephemeralness of the docId is not a problem, my concern is regarding
> the collisions considering that I might need to have many documents in the
> same index. This boils down to the birthday problem and we might not be
> safe in the range of an integer.
>
> Do you have any suggestions about this one?
>
> Thanks,
> Serkan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message