lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Index not recreated
Date Tue, 15 Aug 2006 13:47:08 GMT
I'm wondering if your best (and fastest) solution might be to just create a
fresh index each night in a new place and copy it over upon completion to
the "real" index location, then re-open your IndexReader(s). This has the
added benefit of always providing you with a backup in case, say, the power
goes out half way through creating your index. Your index may then be a day
older than you want, but at least your app does *something* useful....

Anyway, back to work....

Erick

On 8/15/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
>
> > Van: Erick Erickson [mailto:erickerickson@gmail.com]
> > Verzonden: maandag 14 augustus 2006 16:52
> > Aan: java-user@lucene.apache.org
> > Onderwerp: Re: Index not recreated
> >
> > You have all my sympathy. Let me see if I can restate your
> > problem.....
> >
> > "Hey Ron. The indexing process doesn't work. We can't/won't
> > let you look at the process or the results. We can't/won't
> > let you look at the finished product. We can't/won't let you
> > on the machine where it fails. Now fix it"
> > <G>.....
>
>
> That's more or less the current situation :)
>
>
> > There has been some discussion on the threads about
> > "interesing" behavior with NFS mounts, mostly having to do
> > with locking issues. But if you're using an NFS mount (or
> > other non-local filesystem), I can imagine that there could
> > be other kinds of issues, you might want to search the
> > archive if this applies to you....
>
>
> Hmm, I'll look into this. I don't know whether the index is
> stored on a non-local filesystem.
>
> Another strange thing is that indexing in the application has
> been working fine for about a year. Things went wrong only
> recently. I suddenly do remember that we had an upgrade from
> JDK 1.5.0_04 to 1.5.0_07. Could this have anything to do with
> it (rather unlikely I think).
>
>
> > Good luck!
> > Erick
>
>
> Thanks,
> Ronald.
>
>
>
> >
> > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> > >
> > > Thanks for your response, comments are below. I'm using
> > Lucene 1.9.1.
> > >
> > >
> > > > Van: Erick Erickson [mailto:erickerickson@gmail.com]
> > > > Verzonden: maandag 14 augustus 2006 16:20
> > > > Onderwerp: Re: Index not recreated
> > > >
> > > > My first suspicion is that you have duplicate documents on the
> > > > *input* side, or are somehow adding documents more than
> > once. I use
> > > > code similar to yours and it works just fine for me.....
> > >
> > >
> > > This was my first suspicion also, but the facts seem to
> > rule out this
> > > possibility. When I create an index from scratch (without having a
> > > previous, old one), everything is ok (no duplicates). This
> > only happens the next time.
> > > So first I'm going to determine whether the index is really deleted
> > > after calling FSDirectory.getDirectory(indexDirectory,
> > true). If this
> > > is the case, I'm going to check whether I add duplicates myself.
> > >
> > >
> > > > How big is the index before and after you re-create it? Twice the
> > > > size and you're appending, not twice then.....
> > >
> > >
> > > An additional problem is that my issue is only reproducable on the
> > > production environment and I have very limited access
> > there. I cannot
> > > answer this right away. Furthermore, the problem does not occur
> > > always, which makes it even more fun ;-)
> > >
> > >
> > > > Are you absolutely sure that you're not somehow, adding documents
> > > > more than once? I can imagine that this could occur by processing
> > > > the source multiple times (don't know how you get your input) or
> > > > adding the document multiple times through some logic error. I've
> > > > also had my SQL queries return the same row more than once upon
> > > > occasion, usually cured with the "distinct"
> > > > qualifier.
> > > >
> > > > If you have some sort of unique ID, I can imagine debug
> > code with a
> > > > set of IDs and error reporting when you add a doc
> > > > (row) already in your index.....
> > >
> > >
> > > If I'm absolutely positive that the original index is removed by
> > > calling FSDirectory.getDirectory(indexDirectory, true), I'm
> > going to
> > > explore this possibility and add some extensive logging to
> > the pieces
> > > of code where documents are added (I do have a unique id,
> > so this can be checked).
> > >
> > >
> > > > Luke will help you examine your index to see if it's what
> > you think
> > > > is there. Perhaps another way to test this would be to add (again
> > > > for
> > > > debugging) a timestamp field in your index. That way, you
> > would know
> > > > when you added your duplicate rows.
> > >
> > >
> > > I haven't tried Luke yet to look at the index, since I haven't been
> > > able to get my hands on the actual index unfortunately.
> > >
> > >
> > > > Finally, you might try creating an index in a new
> > directory that you
> > > > *know* is empty and seeing what you get and how it
> > compares against
> > > > your current process. Although I'd expect your
> > indexwriter code to
> > > > barf if you had file locking issues and couldn't empty
> > the index, I
> > > > suppose it's possible....
> > >
> > >
> > > That's a good solution if all my other attempts fail :)
> > >
> > >
> > >
> > > > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm experiencing the problem that my index does not seem to be
> > > > > recreated, despite using the correct flags. The result is that
> > > > > documents that represent equal database rows occur multiple
> > > > times in
> > > > > the index. I recreate my entire index each night.
> > > > >
> > > > > My IndexDirectory/IndexWriter construction code looks like this:
> > > > >
> > > > >    File indexDirectory = new File(indexPath);
> > > > >    FSDirectory luceneIndexDirectory =
> > > > > FSDirectory.getDirectory(indexDirectory, true);
> > > > >    IndexWriter indexWriter = new
> > IndexWriter(luceneIndexDirectory,
> > > > > analyzer, true);
> > > > >
> > > > > This code should take care of recreating my index, but
> > it does not
> > > > > seem to be working properly. It looks like the old index is not
> > > > > removed and the same documents are added to my index again.
> > > > >
> > > > > I have strong reasons to not suspect other code to add
> > duplicate
> > > > > documents. First, if no index has yet been created, no
> > duplicate
> > > > > documents are added. Second, if an old index does exist, after
> > > > > recreating the index all documents exist exactly twice (and the
> > > > > following night they exist three times, etc.). It is not
> > > > the case that
> > > > > some documents are duplicated.
> > > > >
> > > > > Does anyone have any ideas?
> > > > >
> > > > > Thanks in advance,
> > > > > Ronald.
> > > > >
> > > > >
> > > > > DISCLAIMER:
> > > > >
> > > > > Dit bericht (met bijlagen) is met grote zorgvuldigheid
> > > > samengesteld.
> > > > > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin
> > > > > verstrekte informatie kan Kennisnet geen aansprakelijkheid
> > > > aanvaarden,
> > > > > evenmin kunnen aan de inhoud van dit bericht (met bijlagen)
> > > > > rechten worden ontleend. De inhoud van dit bericht (met
> > bijlagen)
> > > > > kan vertrouwelijke informatie bevatten en is uitsluitend
> > > > bestemd voor de
> > > > > geadresseerde van dit bericht. Indien u niet de beoogde
> > > > ontvanger van
> > > > > dit bericht bent, verzoekt Kennisnet u dit bericht te
> > verwijderen,
> > > > > eventuele bijlagen niet te openen en wijst Kennisnet u op de
> > > > > onrechtmatigheid van het gebruiken, kopiëren of verspreiden
> > > > van de inhoud van dit bericht (met bijlagen).
> > > > >
> > > > > This message (with attachments) is given in good faith.
> > Kennisnet
> > > > > cannot assume any responsibility for the accuracy or
> > reliability
> > > > > of the information contained in this message (with
> > attachments),
> > > > > nor shall the information be construed as constituting any
> > > > obligation on
> > > > > the part of Kennisnet. The information contained in this
> > > > message (with
> > > > > attachments) may be confidential or privileged and is only
> > > > > intended for the use of the named addressee. If you are not the
> > > > > intended recipient, you are requested by Kennisnet to
> > delete this
> > > > message (with
> > > > > attachments) without opening it and you are notified by
> > > > Kennisnet that
> > > > > any disclosure, copying or distribution of the information
> > > > contained
> > > > > in this message (with attachments) is strictly prohibited
> > > > and unlawful.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > > DISCLAIMER:
> > >
> > > Dit bericht (met bijlagen) is met grote zorgvuldigheid
> > samengesteld.
> > > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin
> > > verstrekte informatie kan Kennisnet geen aansprakelijkheid
> > aanvaarden,
> > > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten
> > > worden ontleend. De inhoud van dit bericht (met bijlagen) kan
> > > vertrouwelijke informatie bevatten en is uitsluitend
> > bestemd voor de
> > > geadresseerde van dit bericht. Indien u niet de beoogde
> > ontvanger van
> > > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen,
> > > eventuele bijlagen niet te openen en wijst Kennisnet u op de
> > > onrechtmatigheid van het gebruiken, kopiëren of verspreiden
> > van de inhoud van dit bericht (met bijlagen).
> > >
> > > This message (with attachments) is given in good faith. Kennisnet
> > > cannot assume any responsibility for the accuracy or reliability of
> > > the information contained in this message (with attachments), nor
> > > shall the information be construed as constituting any
> > obligation on
> > > the part of Kennisnet. The information contained in this
> > message (with
> > > attachments) may be confidential or privileged and is only intended
> > > for the use of the named addressee. If you are not the intended
> > > recipient, you are requested by Kennisnet to delete this
> > message (with
> > > attachments) without opening it and you are notified by
> > Kennisnet that
> > > any disclosure, copying or distribution of the information
> > contained
> > > in this message (with attachments) is strictly prohibited
> > and unlawful.
> > >
> > >
> >
>
>
> DISCLAIMER:
>
> Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor
> mogelijke onjuistheid en/of onvolledigheid van de hierin verstrekte
> informatie kan Kennisnet geen aansprakelijkheid aanvaarden, evenmin kunnen
> aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend. De
> inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie bevatten
> en is uitsluitend bestemd voor de geadresseerde van dit bericht. Indien u
> niet de beoogde ontvanger van dit bericht bent, verzoekt Kennisnet u dit
> bericht te verwijderen, eventuele bijlagen niet te openen en wijst Kennisnet
> u op de onrechtmatigheid van het gebruiken, kopiëren of verspreiden van de
> inhoud van dit bericht (met bijlagen).
>
> This message (with attachments) is given in good faith. Kennisnet cannot
> assume any responsibility for the accuracy or reliability of the information
> contained in this message (with attachments), nor shall the information be
> construed as constituting any obligation on the part of Kennisnet. The
> information contained in this message (with attachments) may be confidential
> or privileged and is only intended for the use of the named addressee. If
> you are not the intended recipient, you are requested by Kennisnet to delete
> this message (with attachments) without opening it and you are notified by
> Kennisnet that any disclosure, copying or distribution of the information
> contained in this message (with attachments) is strictly prohibited and
> unlawful.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message