lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: Is there a way to check for field "uniqueness" when indexing?
Date Wed, 26 Aug 2009 22:36:50 GMT

You may want to look at SOLR-1375 which enables ID checking
using a BloomFilter (with a specified errorrate of false
positives). Otherwise for what you're trying to do, you'd need
to create a hash map?


On Thu, Aug 13, 2009 at 7:33 AM, Daniel Shane<> wrote:
> Hi all!
> I'm currently running a big lucene index and one of my main concerns is the
> integrity of the data entered. A few things come to mind, like enforcing
> that certain fields be non-blank, forcing certain formats etc...
> All these validations are easy to do with lucene, since I can validate the
> document before it is indexed or when it is retrieved.
> The thing however that I have a hard time with, is field uniquness.
> Lets say I have a field and I really want it to be unique. I can't seem to
> find out how to do it during the indexation phase since everything that is
> added to the index is not readable by an index reader until the index is
> closed.
> Add to that the fact that items can be deleted from the index during the
> indexation and the only way I have to figure uniquness is to check every
> unique field values using termEnums and checking for docFreq.
> This has a major disadvantage that I cannot inform people who are using the
> library of the unique conflit when it happens, only when the index is
> closed.
> Does anyone have an idea on how I could check an index that is in the
> process of being indexed (things added, things deleted) for the uniquess of
> a given field *at the time I index a document* ?
> Daniel Shane
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message