lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <>
Subject duplication checking while indexing
Date Mon, 29 Dec 2008 09:55:14 GMT
I am wondering whether there is an easy way to avoid duplication while
indexing, just using the index being created, without creating other data
In some cases, the incoming document list can have duplicates. For example,
when creating spell checking indexes for phrases. Each phrase is one
document. So I want to check whether the phrase is already indexed or not.

To do so, I can either create a hash map for all the indexed phrases. But
the hash map would consume a lot of memory.
A possible alternative is to search existing index. But remember the index
is being created, and not all contents are flushed to disk yet.

Is it possible to query the not-yet-closed index?

Chris Lu
Instant Scalable Full-Text Search On Any Database/Application
Lucene Database Search in 3 minutes:
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message