lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ronald Wildenberg" <r.wildenb...@kennisnet.org>
Subject RE: Index not recreated
Date Tue, 15 Aug 2006 06:35:34 GMT
> Van: Erick Erickson [mailto:erickerickson@gmail.com] 
> Verzonden: maandag 14 augustus 2006 16:52
> Aan: java-user@lucene.apache.org
> Onderwerp: Re: Index not recreated
> 
> You have all my sympathy. Let me see if I can restate your 
> problem.....
> 
> "Hey Ron. The indexing process doesn't work. We can't/won't 
> let you look at the process or the results. We can't/won't 
> let you look at the finished product. We can't/won't let you 
> on the machine where it fails. Now fix it"
> <G>.....


That's more or less the current situation :)


> There has been some discussion on the threads about 
> "interesing" behavior with NFS mounts, mostly having to do 
> with locking issues. But if you're using an NFS mount (or 
> other non-local filesystem), I can imagine that there could 
> be other kinds of issues, you might want to search the 
> archive if this applies to you....


Hmm, I'll look into this. I don't know whether the index is
stored on a non-local filesystem.

Another strange thing is that indexing in the application has
been working fine for about a year. Things went wrong only
recently. I suddenly do remember that we had an upgrade from
JDK 1.5.0_04 to 1.5.0_07. Could this have anything to do with
it (rather unlikely I think).


> Good luck!
> Erick


Thanks,
Ronald.



> 
> On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> >
> > Thanks for your response, comments are below. I'm using 
> Lucene 1.9.1.
> >
> >
> > > Van: Erick Erickson [mailto:erickerickson@gmail.com]
> > > Verzonden: maandag 14 augustus 2006 16:20
> > > Onderwerp: Re: Index not recreated
> > >
> > > My first suspicion is that you have duplicate documents on the 
> > > *input* side, or are somehow adding documents more than 
> once. I use 
> > > code similar to yours and it works just fine for me.....
> >
> >
> > This was my first suspicion also, but the facts seem to 
> rule out this 
> > possibility. When I create an index from scratch (without having a 
> > previous, old one), everything is ok (no duplicates). This 
> only happens the next time.
> > So first I'm going to determine whether the index is really deleted 
> > after calling FSDirectory.getDirectory(indexDirectory, 
> true). If this 
> > is the case, I'm going to check whether I add duplicates myself.
> >
> >
> > > How big is the index before and after you re-create it? Twice the 
> > > size and you're appending, not twice then.....
> >
> >
> > An additional problem is that my issue is only reproducable on the 
> > production environment and I have very limited access 
> there. I cannot 
> > answer this right away. Furthermore, the problem does not occur 
> > always, which makes it even more fun ;-)
> >
> >
> > > Are you absolutely sure that you're not somehow, adding documents 
> > > more than once? I can imagine that this could occur by processing 
> > > the source multiple times (don't know how you get your input) or 
> > > adding the document multiple times through some logic error. I've 
> > > also had my SQL queries return the same row more than once upon 
> > > occasion, usually cured with the "distinct"
> > > qualifier.
> > >
> > > If you have some sort of unique ID, I can imagine debug 
> code with a 
> > > set of IDs and error reporting when you add a doc
> > > (row) already in your index.....
> >
> >
> > If I'm absolutely positive that the original index is removed by 
> > calling FSDirectory.getDirectory(indexDirectory, true), I'm 
> going to 
> > explore this possibility and add some extensive logging to 
> the pieces 
> > of code where documents are added (I do have a unique id, 
> so this can be checked).
> >
> >
> > > Luke will help you examine your index to see if it's what 
> you think 
> > > is there. Perhaps another way to test this would be to add (again 
> > > for
> > > debugging) a timestamp field in your index. That way, you 
> would know 
> > > when you added your duplicate rows.
> >
> >
> > I haven't tried Luke yet to look at the index, since I haven't been 
> > able to get my hands on the actual index unfortunately.
> >
> >
> > > Finally, you might try creating an index in a new 
> directory that you 
> > > *know* is empty and seeing what you get and how it 
> compares against 
> > > your current process. Although I'd expect your 
> indexwriter code to 
> > > barf if you had file locking issues and couldn't empty 
> the index, I 
> > > suppose it's possible....
> >
> >
> > That's a good solution if all my other attempts fail :)
> >
> >
> >
> > > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm experiencing the problem that my index does not seem to be 
> > > > recreated, despite using the correct flags. The result is that 
> > > > documents that represent equal database rows occur multiple
> > > times in
> > > > the index. I recreate my entire index each night.
> > > >
> > > > My IndexDirectory/IndexWriter construction code looks like this:
> > > >
> > > >    File indexDirectory = new File(indexPath);
> > > >    FSDirectory luceneIndexDirectory = 
> > > > FSDirectory.getDirectory(indexDirectory, true);
> > > >    IndexWriter indexWriter = new 
> IndexWriter(luceneIndexDirectory, 
> > > > analyzer, true);
> > > >
> > > > This code should take care of recreating my index, but 
> it does not 
> > > > seem to be working properly. It looks like the old index is not 
> > > > removed and the same documents are added to my index again.
> > > >
> > > > I have strong reasons to not suspect other code to add 
> duplicate 
> > > > documents. First, if no index has yet been created, no 
> duplicate 
> > > > documents are added. Second, if an old index does exist, after 
> > > > recreating the index all documents exist exactly twice (and the 
> > > > following night they exist three times, etc.). It is not
> > > the case that
> > > > some documents are duplicated.
> > > >
> > > > Does anyone have any ideas?
> > > >
> > > > Thanks in advance,
> > > > Ronald.
> > > >
> > > >
> > > > DISCLAIMER:
> > > >
> > > > Dit bericht (met bijlagen) is met grote zorgvuldigheid
> > > samengesteld.
> > > > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin 
> > > > verstrekte informatie kan Kennisnet geen aansprakelijkheid
> > > aanvaarden,
> > > > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) 
> > > > rechten worden ontleend. De inhoud van dit bericht (met 
> bijlagen) 
> > > > kan vertrouwelijke informatie bevatten en is uitsluitend
> > > bestemd voor de
> > > > geadresseerde van dit bericht. Indien u niet de beoogde
> > > ontvanger van
> > > > dit bericht bent, verzoekt Kennisnet u dit bericht te 
> verwijderen, 
> > > > eventuele bijlagen niet te openen en wijst Kennisnet u op de 
> > > > onrechtmatigheid van het gebruiken, kopiëren of verspreiden
> > > van de inhoud van dit bericht (met bijlagen).
> > > >
> > > > This message (with attachments) is given in good faith. 
> Kennisnet 
> > > > cannot assume any responsibility for the accuracy or 
> reliability 
> > > > of the information contained in this message (with 
> attachments), 
> > > > nor shall the information be construed as constituting any
> > > obligation on
> > > > the part of Kennisnet. The information contained in this
> > > message (with
> > > > attachments) may be confidential or privileged and is only 
> > > > intended for the use of the named addressee. If you are not the 
> > > > intended recipient, you are requested by Kennisnet to 
> delete this
> > > message (with
> > > > attachments) without opening it and you are notified by
> > > Kennisnet that
> > > > any disclosure, copying or distribution of the information
> > > contained
> > > > in this message (with attachments) is strictly prohibited
> > > and unlawful.
> > > >
> > > >
> > > >
> > >
> >
> >
> > DISCLAIMER:
> >
> > Dit bericht (met bijlagen) is met grote zorgvuldigheid 
> samengesteld. 
> > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin 
> > verstrekte informatie kan Kennisnet geen aansprakelijkheid 
> aanvaarden, 
> > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten 
> > worden ontleend. De inhoud van dit bericht (met bijlagen) kan 
> > vertrouwelijke informatie bevatten en is uitsluitend 
> bestemd voor de 
> > geadresseerde van dit bericht. Indien u niet de beoogde 
> ontvanger van 
> > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen, 
> > eventuele bijlagen niet te openen en wijst Kennisnet u op de 
> > onrechtmatigheid van het gebruiken, kopiëren of verspreiden 
> van de inhoud van dit bericht (met bijlagen).
> >
> > This message (with attachments) is given in good faith. Kennisnet 
> > cannot assume any responsibility for the accuracy or reliability of 
> > the information contained in this message (with attachments), nor 
> > shall the information be construed as constituting any 
> obligation on 
> > the part of Kennisnet. The information contained in this 
> message (with 
> > attachments) may be confidential or privileged and is only intended 
> > for the use of the named addressee. If you are not the intended 
> > recipient, you are requested by Kennisnet to delete this 
> message (with 
> > attachments) without opening it and you are notified by 
> Kennisnet that 
> > any disclosure, copying or distribution of the information 
> contained 
> > in this message (with attachments) is strictly prohibited 
> and unlawful.
> >
> >
> 


DISCLAIMER:

Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor mogelijke onjuistheid
en/of onvolledigheid van de hierin verstrekte informatie kan Kennisnet geen aansprakelijkheid
aanvaarden, evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend.
De inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie bevatten en is uitsluitend
bestemd voor de geadresseerde van dit bericht. Indien u niet de beoogde ontvanger van dit
bericht bent, verzoekt Kennisnet u dit bericht te verwijderen, eventuele bijlagen niet te
openen en wijst Kennisnet u op de onrechtmatigheid van het gebruiken, kopiëren of verspreiden
van de inhoud van dit bericht (met bijlagen).

This message (with attachments) is given in good faith. Kennisnet cannot assume any responsibility
for the accuracy or reliability of the information contained in this message (with attachments),
nor shall the information be construed as constituting any obligation on the part of Kennisnet.
The information contained in this message (with attachments) may be confidential or privileged
and is only intended for the use of the named addressee. If you are not the intended recipient,
you are requested by Kennisnet to delete this message (with attachments) without opening it
and you are notified by Kennisnet that any disclosure, copying or distribution of the information
contained in this message (with attachments) is strictly prohibited and unlawful.

Mime
View raw message