lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wesley MacDonald" <...@extremesoftware.ca>
Subject RE: globally unique field
Date Wed, 20 Apr 2005 02:09:55 GMT
Hi,

I posted this in the past:

Java has GUID's classes called java.rmi.server.UID and
java.rmi.dgc.VMID.

The UID class can generate identifiers that are unique over time within
a JVM. The VMID class provides uniqueness across ALL JVM's.

UID consists of a unique number based on a hashcode, system time and a
counter, and a VMID contains a UID and adds a SHA hash based on IP
address.


Then you won't need to check to see if your URL is unique...it will be.


Wes.  

-----Original Message-----
From: Mike Baranczak [mailto:mbarancz@twcny.rr.com] 
Sent: April 19, 2005 7:45 PM
To: java-user@lucene.apache.org
Subject: globally unique field

First of all, a big thanks to all the Lucene hackers - I've only been
using your product for a couple of weeks, and I've been very impressed
by what I've seen.

Here's my question: I have an index with a little over 3 million
documents in it, with more on the way. Each document has an "URL" field
(which is not indexed). I want to guarantee that each URL is unique;
that is, when I'm adding a new document, I have to check if another
existing document has the same value for the URL field. What's the best
way to do it? I can think of two possible approaches:

1 - Open an IndexReader and iterate over all the Documents that it
contains, checking the value of the "URL" field for each Document. This
seems a little inefficient, since I only care about one field, and I
don't want to have to retrieve all of the fields.

2 - Rebuild the index such that the URL field is indexed. Then, I could
just do a normal search for the value of the URL. But since the URL
field will never be searched under any other circumstances, this seems
like kind of a waste of disk space.

I'm sure somebody else has had to do something like this before. Is
there a better way to do it than what I've described above? If not, then
which of the two approaches will give me the best results?

Thanks in advance.

-MB


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message