lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Index not recreated
Date Mon, 14 Aug 2006 14:19:41 GMT
My first suspicion is that you have duplicate documents on the *input* side,
or are somehow adding documents more than once. I use code similar to yours
and it works just fine for me.....

How big is the index before and after you re-create it? Twice the size and
you're appending, not twice then.....

Are you absolutely sure that you're not somehow, adding documents more than
once? I can imagine that this could occur by processing the source multiple
times (don't know how you get your input) or adding the document multiple
times through some logic error. I've also had my SQL queries return the same
row more than once upon occasion, usually cured with the "distinct"
qualifier.

If you have some sort of unique ID, I can imagine debug code with a set of
IDs and error reporting when you add a doc (row) already in your index.....

Luke will help you examine your index to see if it's what you think is
there. Perhaps another way to test this would be to add (again for
debugging) a timestamp field in your index. That way, you would know when
you added your duplicate rows.

Finally, you might try creating an index in a new directory that you *know*
is empty and seeing what you get and how it compares against your current
process. Although I'd expect your indexwriter code to barf if you had file
locking issues and couldn't empty the index, I suppose it's possible....

I guess I'm suggesting that you're really re-processing your input since
I've never had a problem with the code creating a new index, and haven't
seen it discussed on the mailing list, so I *strongly* suspect pilot error
here <G>. But I've been wrong before....

And this is Lucene 1.9? 2.0?

Best
Erick

On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
>
> Hi,
>
> I'm experiencing the problem that my index does not seem to be
> recreated, despite using the correct flags. The result is that documents
> that represent equal database rows occur multiple times in the index. I
> recreate my entire index each night.
>
> My IndexDirectory/IndexWriter construction code looks like this:
>
>    File indexDirectory = new File(indexPath);
>    FSDirectory luceneIndexDirectory =
> FSDirectory.getDirectory(indexDirectory, true);
>    IndexWriter indexWriter = new IndexWriter(luceneIndexDirectory,
> analyzer, true);
>
> This code should take care of recreating my index, but it does not seem
> to be working properly. It looks like the old index is not removed and
> the same documents are added to my index again.
>
> I have strong reasons to not suspect other code to add duplicate
> documents. First, if no index has yet been created, no duplicate
> documents are added. Second, if an old index does exist, after
> recreating the index all documents exist exactly twice (and the
> following night they exist three times, etc.). It is not the case that
> some documents are duplicated.
>
> Does anyone have any ideas?
>
> Thanks in advance,
> Ronald.
>
>
> DISCLAIMER:
>
> Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor
> mogelijke onjuistheid en/of onvolledigheid van de hierin verstrekte
> informatie kan Kennisnet geen aansprakelijkheid aanvaarden, evenmin kunnen
> aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend. De
> inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie bevatten
> en is uitsluitend bestemd voor de geadresseerde van dit bericht. Indien u
> niet de beoogde ontvanger van dit bericht bent, verzoekt Kennisnet u dit
> bericht te verwijderen, eventuele bijlagen niet te openen en wijst Kennisnet
> u op de onrechtmatigheid van het gebruiken, kopiƫren of verspreiden van de
> inhoud van dit bericht (met bijlagen).
>
> This message (with attachments) is given in good faith. Kennisnet cannot
> assume any responsibility for the accuracy or reliability of the information
> contained in this message (with attachments), nor shall the information be
> construed as constituting any obligation on the part of Kennisnet. The
> information contained in this message (with attachments) may be confidential
> or privileged and is only intended for the use of the named addressee. If
> you are not the intended recipient, you are requested by Kennisnet to delete
> this message (with attachments) without opening it and you are notified by
> Kennisnet that any disclosure, copying or distribution of the information
> contained in this message (with attachments) is strictly prohibited and
> unlawful.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message