lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Baranczak <mbara...@twcny.rr.com>
Subject globally unique field
Date Tue, 19 Apr 2005 23:44:35 GMT
First of all, a big thanks to all the Lucene hackers - I've only been 
using your product for a couple of weeks, and I've been very impressed 
by what I've seen.

Here's my question: I have an index with a little over 3 million 
documents in it, with more on the way. Each document has an "URL" field 
(which is not indexed). I want to guarantee that each URL is unique; 
that is, when I'm adding a new document, I have to check if another 
existing document has the same value for the URL field. What's the best 
way to do it? I can think of two possible approaches:

1 - Open an IndexReader and iterate over all the Documents that it 
contains, checking the value of the "URL" field for each Document. This 
seems a little inefficient, since I only care about one field, and I 
don't want to have to retrieve all of the fields.

2 - Rebuild the index such that the URL field is indexed. Then, I could 
just do a normal search for the value of the URL. But since the URL 
field will never be searched under any other circumstances, this seems 
like kind of a waste of disk space.

I'm sure somebody else has had to do something like this before. Is 
there a better way to do it than what I've described above? If not, 
then which of the two approaches will give me the best results?

Thanks in advance.

-MB


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message