lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serkan Mulayim <serkanmula...@gmail.com>
Subject [lucy-user] Regarding document Ids
Date Tue, 15 Nov 2016 22:22:19 GMT
Hi,

As far as I see if we add the same document twice, it creates a new
document. As per http://lucy.apache.org/docs/c/Lucy/Docs/DocIDs.html, " If
you truly need a primary key field, you must define it and populate it
yourself". Can you please elaborate on this one? Does it mean choosing a
field to be primary key and delete the document with the primary key and
re-add it? If so the document might have not been created until we commit,
so deletion would not be possible, right? Also performance would be another
issue.

Another solution might be hashing the "primary key" and put it as the
documentId (but the referred page also says that docIds are ephemeral). If
the ephemeralness of the docId is not a problem, my concern is regarding
the collisions considering that I might need to have many documents in the
same index. This boils down to the birthday problem and we might not be
safe in the range of an integer.

Do you have any suggestions about this one?

Thanks,
Serkan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message