lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ronald Wildenberg" <r.wildenb...@kennisnet.org>
Subject RE: Index not recreated
Date Mon, 14 Aug 2006 14:42:19 GMT
Thanks for your response, comments are below. I'm using Lucene 1.9.1.
 

> Van: Erick Erickson [mailto:erickerickson@gmail.com] 
> Verzonden: maandag 14 augustus 2006 16:20
> Onderwerp: Re: Index not recreated
> 
> My first suspicion is that you have duplicate documents on 
> the *input* side, or are somehow adding documents more than 
> once. I use code similar to yours and it works just fine for me.....


This was my first suspicion also, but the facts seem to rule out this possibility. When I
create an index from scratch (without having a previous, old one), everything is ok (no duplicates).
This only happens the next time. So first I'm going to determine whether the index is really
deleted after calling FSDirectory.getDirectory(indexDirectory, true). If this is the case,
I'm going to check whether I add duplicates myself.


> How big is the index before and after you re-create it? Twice 
> the size and you're appending, not twice then.....


An additional problem is that my issue is only reproducable on the production environment
and I have very limited access there. I cannot answer this right away. Furthermore, the problem
does not occur always, which makes it even more fun ;-)

 
> Are you absolutely sure that you're not somehow, adding 
> documents more than once? I can imagine that this could occur 
> by processing the source multiple times (don't know how you 
> get your input) or adding the document multiple times through 
> some logic error. I've also had my SQL queries return the 
> same row more than once upon occasion, usually cured with the 
> "distinct"
> qualifier.
> 
> If you have some sort of unique ID, I can imagine debug code 
> with a set of IDs and error reporting when you add a doc 
> (row) already in your index.....


If I'm absolutely positive that the original index is removed by calling FSDirectory.getDirectory(indexDirectory,
true), I'm going to explore this possibility and add some extensive logging to the pieces
of code where documents are added (I do have a unique id, so this can be checked).


> Luke will help you examine your index to see if it's what you 
> think is there. Perhaps another way to test this would be to 
> add (again for
> debugging) a timestamp field in your index. That way, you 
> would know when you added your duplicate rows.


I haven't tried Luke yet to look at the index, since I haven't been able to get my hands on
the actual index unfortunately.


> Finally, you might try creating an index in a new directory 
> that you *know* is empty and seeing what you get and how it 
> compares against your current process. Although I'd expect 
> your indexwriter code to barf if you had file locking issues 
> and couldn't empty the index, I suppose it's possible....


That's a good solution if all my other attempts fail :)



> On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> >
> > Hi,
> >
> > I'm experiencing the problem that my index does not seem to be 
> > recreated, despite using the correct flags. The result is that 
> > documents that represent equal database rows occur multiple 
> times in 
> > the index. I recreate my entire index each night.
> >
> > My IndexDirectory/IndexWriter construction code looks like this:
> >
> >    File indexDirectory = new File(indexPath);
> >    FSDirectory luceneIndexDirectory =
> > FSDirectory.getDirectory(indexDirectory, true);
> >    IndexWriter indexWriter = new IndexWriter(luceneIndexDirectory, 
> > analyzer, true);
> >
> > This code should take care of recreating my index, but it does not 
> > seem to be working properly. It looks like the old index is not 
> > removed and the same documents are added to my index again.
> >
> > I have strong reasons to not suspect other code to add duplicate 
> > documents. First, if no index has yet been created, no duplicate 
> > documents are added. Second, if an old index does exist, after 
> > recreating the index all documents exist exactly twice (and the 
> > following night they exist three times, etc.). It is not 
> the case that 
> > some documents are duplicated.
> >
> > Does anyone have any ideas?
> >
> > Thanks in advance,
> > Ronald.
> >
> >
> > DISCLAIMER:
> >
> > Dit bericht (met bijlagen) is met grote zorgvuldigheid 
> samengesteld. 
> > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin 
> > verstrekte informatie kan Kennisnet geen aansprakelijkheid 
> aanvaarden, 
> > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten 
> > worden ontleend. De inhoud van dit bericht (met bijlagen) kan 
> > vertrouwelijke informatie bevatten en is uitsluitend 
> bestemd voor de 
> > geadresseerde van dit bericht. Indien u niet de beoogde 
> ontvanger van 
> > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen, 
> > eventuele bijlagen niet te openen en wijst Kennisnet u op de 
> > onrechtmatigheid van het gebruiken, kopiƫren of verspreiden 
> van de inhoud van dit bericht (met bijlagen).
> >
> > This message (with attachments) is given in good faith. Kennisnet 
> > cannot assume any responsibility for the accuracy or reliability of 
> > the information contained in this message (with attachments), nor 
> > shall the information be construed as constituting any 
> obligation on 
> > the part of Kennisnet. The information contained in this 
> message (with 
> > attachments) may be confidential or privileged and is only intended 
> > for the use of the named addressee. If you are not the intended 
> > recipient, you are requested by Kennisnet to delete this 
> message (with 
> > attachments) without opening it and you are notified by 
> Kennisnet that 
> > any disclosure, copying or distribution of the information 
> contained 
> > in this message (with attachments) is strictly prohibited 
> and unlawful.
> >
> >
> >
> 


DISCLAIMER:

Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor mogelijke onjuistheid
en/of onvolledigheid van de hierin verstrekte informatie kan Kennisnet geen aansprakelijkheid
aanvaarden, evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend.
De inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie bevatten en is uitsluitend
bestemd voor de geadresseerde van dit bericht. Indien u niet de beoogde ontvanger van dit
bericht bent, verzoekt Kennisnet u dit bericht te verwijderen, eventuele bijlagen niet te
openen en wijst Kennisnet u op de onrechtmatigheid van het gebruiken, kopiƫren of verspreiden
van de inhoud van dit bericht (met bijlagen).

This message (with attachments) is given in good faith. Kennisnet cannot assume any responsibility
for the accuracy or reliability of the information contained in this message (with attachments),
nor shall the information be construed as constituting any obligation on the part of Kennisnet.
The information contained in this message (with attachments) may be confidential or privileged
and is only intended for the use of the named addressee. If you are not the intended recipient,
you are requested by Kennisnet to delete this message (with attachments) without opening it
and you are notified by Kennisnet that any disclosure, copying or distribution of the information
contained in this message (with attachments) is strictly prohibited and unlawful.

Mime
View raw message