lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Polites" <jason.poli...@gmail.com>
Subject Re: Index not recreated
Date Tue, 15 Aug 2006 00:28:05 GMT
PS...

The "intermittent" nature of your problem points to a concurrency issue.
Does the production environment have a greater number of users?  If so, this
likely translates to a greater number of threads acting upon the index.  I'd
be looking for possible conflicts between different threads accessing the
index.  This would also explain why you see the problem in production and
not testing.

On 8/15/06, Jason Polites <jason.polites@gmail.com> wrote:
>
> My advice would be the "back-to-basics" approach.  Create a test case
> which creates a simple index with a few documents, verify the index is as
> you expect, then re-create the index and verify again.  Run this test case
> on your production environment (if you are able).  This will determine once
> and for all whether it's the production environment, your code or Lucene
> causing the problem.  Then work back from this point.
>
> The alternative is that you spend hours looking into areas which are
> actually working fine.
>
>
> On 8/15/06, Erick Erickson < erickerickson@gmail.com> wrote:
> >
> > You have all my sympathy. Let me see if I can restate your problem.....
> >
> > "Hey Ron. The indexing process doesn't work. We can't/won't let you look
> > at
> > the process or the results. We can't/won't let you look at the finished
> > product. We can't/won't let you on the machine where it fails. Now fix
> > it"
> > <G>.....
> >
> > There has been some discussion on the threads about "interesing"
> > behavior
> > with NFS mounts, mostly having to do with locking issues. But if you're
> > using an NFS mount (or other non-local filesystem), I can imagine that
> > there
> > could be other kinds of issues, you might want to search the archive if
> > this
> > applies to you....
> >
> > Good luck!
> > Erick
> >
> > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org > wrote:
> > >
> > > Thanks for your response, comments are below. I'm using Lucene 1.9.1.
> > >
> > >
> > > > Van: Erick Erickson [mailto:erickerickson@gmail.com ]
> > > > Verzonden: maandag 14 augustus 2006 16:20
> > > > Onderwerp: Re: Index not recreated
> > > >
> > > > My first suspicion is that you have duplicate documents on
> > > > the *input* side, or are somehow adding documents more than
> > > > once. I use code similar to yours and it works just fine for me.....
> > >
> > >
> > > This was my first suspicion also, but the facts seem to rule out this
> > > possibility. When I create an index from scratch (without having a
> > previous,
> > > old one), everything is ok (no duplicates). This only happens the next
> > time.
> > > So first I'm going to determine whether the index is really deleted
> > after
> > > calling FSDirectory.getDirectory(indexDirectory, true). If this is the
> >
> > > case, I'm going to check whether I add duplicates myself.
> > >
> > >
> > > > How big is the index before and after you re-create it? Twice
> > > > the size and you're appending, not twice then.....
> > >
> > >
> > > An additional problem is that my issue is only reproducable on the
> > > production environment and I have very limited access there. I cannot
> > answer
> > > this right away. Furthermore, the problem does not occur always, which
> > makes
> > > it even more fun ;-)
> > >
> > >
> > > > Are you absolutely sure that you're not somehow, adding
> > > > documents more than once? I can imagine that this could occur
> > > > by processing the source multiple times (don't know how you
> > > > get your input) or adding the document multiple times through
> > > > some logic error. I've also had my SQL queries return the
> > > > same row more than once upon occasion, usually cured with the
> > > > "distinct"
> > > > qualifier.
> > > >
> > > > If you have some sort of unique ID, I can imagine debug code
> > > > with a set of IDs and error reporting when you add a doc
> > > > (row) already in your index.....
> > >
> > >
> > > If I'm absolutely positive that the original index is removed by
> > calling
> > > FSDirectory.getDirectory(indexDirectory, true), I'm going to explore
> > this
> > > possibility and add some extensive logging to the pieces of code where
> >
> > > documents are added (I do have a unique id, so this can be checked).
> > >
> > >
> > > > Luke will help you examine your index to see if it's what you
> > > > think is there. Perhaps another way to test this would be to
> > > > add (again for
> > > > debugging) a timestamp field in your index. That way, you
> > > > would know when you added your duplicate rows.
> > >
> > >
> > > I haven't tried Luke yet to look at the index, since I haven't been
> > able
> > > to get my hands on the actual index unfortunately.
> > >
> > >
> > > > Finally, you might try creating an index in a new directory
> > > > that you *know* is empty and seeing what you get and how it
> > > > compares against your current process. Although I'd expect
> > > > your indexwriter code to barf if you had file locking issues
> > > > and couldn't empty the index, I suppose it's possible....
> > >
> > >
> > > That's a good solution if all my other attempts fail :)
> > >
> > >
> > >
> > > > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm experiencing the problem that my index does not seem to be
> > > > > recreated, despite using the correct flags. The result is that
> > > > > documents that represent equal database rows occur multiple
> > > > times in
> > > > > the index. I recreate my entire index each night.
> > > > >
> > > > > My IndexDirectory/IndexWriter construction code looks like this:
> > > > >
> > > > >    File indexDirectory = new File(indexPath);
> > > > >    FSDirectory luceneIndexDirectory =
> > > > > FSDirectory.getDirectory(indexDirectory, true);
> > > > >    IndexWriter indexWriter = new IndexWriter(luceneIndexDirectory,
> >
> > > > > analyzer, true);
> > > > >
> > > > > This code should take care of recreating my index, but it does not
> > > > > seem to be working properly. It looks like the old index is not
> > > > > removed and the same documents are added to my index again.
> > > > >
> > > > > I have strong reasons to not suspect other code to add duplicate
> > > > > documents. First, if no index has yet been created, no duplicate
> > > > > documents are added. Second, if an old index does exist, after
> > > > > recreating the index all documents exist exactly twice (and the
> > > > > following night they exist three times, etc.). It is not
> > > > the case that
> > > > > some documents are duplicated.
> > > > >
> > > > > Does anyone have any ideas?
> > > > >
> > > > > Thanks in advance,
> > > > > Ronald.
> > > > >
> > > > >
> > > > > DISCLAIMER:
> > > > >
> > > > > Dit bericht (met bijlagen) is met grote zorgvuldigheid
> > > > samengesteld.
> > > > > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin
> > > > > verstrekte informatie kan Kennisnet geen aansprakelijkheid
> > > > aanvaarden,
> > > > > evenmin kunnen aan de inhoud van dit bericht (met bijlagen)
> > rechten
> > > > > worden ontleend. De inhoud van dit bericht (met bijlagen) kan
> > > > > vertrouwelijke informatie bevatten en is uitsluitend
> > > > bestemd voor de
> > > > > geadresseerde van dit bericht. Indien u niet de beoogde
> > > > ontvanger van
> > > > > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen,
> >
> > > > > eventuele bijlagen niet te openen en wijst Kennisnet u op de
> > > > > onrechtmatigheid van het gebruiken, kopiƫren of verspreiden
> > > > van de inhoud van dit bericht (met bijlagen).
> > > > >
> > > > > This message (with attachments) is given in good faith. Kennisnet
> > > > > cannot assume any responsibility for the accuracy or reliability
> > of
> > > > > the information contained in this message (with attachments), nor
> > > > > shall the information be construed as constituting any
> > > > obligation on
> > > > > the part of Kennisnet. The information contained in this
> > > > message (with
> > > > > attachments) may be confidential or privileged and is only
> > intended
> > > > > for the use of the named addressee. If you are not the intended
> > > > > recipient, you are requested by Kennisnet to delete this
> > > > message (with
> > > > > attachments) without opening it and you are notified by
> > > > Kennisnet that
> > > > > any disclosure, copying or distribution of the information
> > > > contained
> > > > > in this message (with attachments) is strictly prohibited
> > > > and unlawful.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > > DISCLAIMER:
> > >
> > > Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld.
> > Voor
> > > mogelijke onjuistheid en/of onvolledigheid van de hierin verstrekte
> > > informatie kan Kennisnet geen aansprakelijkheid aanvaarden, evenmin
> > kunnen
> > > aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend.
> > De
> > > inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie
> > bevatten
> > > en is uitsluitend bestemd voor de geadresseerde van dit bericht.
> > Indien u
> > > niet de beoogde ontvanger van dit bericht bent, verzoekt Kennisnet u
> > dit
> > > bericht te verwijderen, eventuele bijlagen niet te openen en wijst
> > Kennisnet
> > > u op de onrechtmatigheid van het gebruiken, kopiƫren of verspreiden
> > van de
> > > inhoud van dit bericht (met bijlagen).
> > >
> > > This message (with attachments) is given in good faith. Kennisnet
> > cannot
> > > assume any responsibility for the accuracy or reliability of the
> > information
> > > contained in this message (with attachments), nor shall the
> > information be
> > > construed as constituting any obligation on the part of Kennisnet. The
> >
> > > information contained in this message (with attachments) may be
> > confidential
> > > or privileged and is only intended for the use of the named addressee.
> > If
> > > you are not the intended recipient, you are requested by Kennisnet to
> > delete
> > > this message (with attachments) without opening it and you are
> > notified by
> > > Kennisnet that any disclosure, copying or distribution of the
> > information
> > > contained in this message (with attachments) is strictly prohibited
> > and
> > > unlawful.
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message