lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Index not recreated
Date Mon, 14 Aug 2006 14:51:53 GMT
You have all my sympathy. Let me see if I can restate your problem.....

"Hey Ron. The indexing process doesn't work. We can't/won't let you look at
the process or the results. We can't/won't let you look at the finished
product. We can't/won't let you on the machine where it fails. Now fix it"
<G>.....

There has been some discussion on the threads about "interesing" behavior
with NFS mounts, mostly having to do with locking issues. But if you're
using an NFS mount (or other non-local filesystem), I can imagine that there
could be other kinds of issues, you might want to search the archive if this
applies to you....

Good luck!
Erick

On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
>
> Thanks for your response, comments are below. I'm using Lucene 1.9.1.
>
>
> > Van: Erick Erickson [mailto:erickerickson@gmail.com]
> > Verzonden: maandag 14 augustus 2006 16:20
> > Onderwerp: Re: Index not recreated
> >
> > My first suspicion is that you have duplicate documents on
> > the *input* side, or are somehow adding documents more than
> > once. I use code similar to yours and it works just fine for me.....
>
>
> This was my first suspicion also, but the facts seem to rule out this
> possibility. When I create an index from scratch (without having a previous,
> old one), everything is ok (no duplicates). This only happens the next time.
> So first I'm going to determine whether the index is really deleted after
> calling FSDirectory.getDirectory(indexDirectory, true). If this is the
> case, I'm going to check whether I add duplicates myself.
>
>
> > How big is the index before and after you re-create it? Twice
> > the size and you're appending, not twice then.....
>
>
> An additional problem is that my issue is only reproducable on the
> production environment and I have very limited access there. I cannot answer
> this right away. Furthermore, the problem does not occur always, which makes
> it even more fun ;-)
>
>
> > Are you absolutely sure that you're not somehow, adding
> > documents more than once? I can imagine that this could occur
> > by processing the source multiple times (don't know how you
> > get your input) or adding the document multiple times through
> > some logic error. I've also had my SQL queries return the
> > same row more than once upon occasion, usually cured with the
> > "distinct"
> > qualifier.
> >
> > If you have some sort of unique ID, I can imagine debug code
> > with a set of IDs and error reporting when you add a doc
> > (row) already in your index.....
>
>
> If I'm absolutely positive that the original index is removed by calling
> FSDirectory.getDirectory(indexDirectory, true), I'm going to explore this
> possibility and add some extensive logging to the pieces of code where
> documents are added (I do have a unique id, so this can be checked).
>
>
> > Luke will help you examine your index to see if it's what you
> > think is there. Perhaps another way to test this would be to
> > add (again for
> > debugging) a timestamp field in your index. That way, you
> > would know when you added your duplicate rows.
>
>
> I haven't tried Luke yet to look at the index, since I haven't been able
> to get my hands on the actual index unfortunately.
>
>
> > Finally, you might try creating an index in a new directory
> > that you *know* is empty and seeing what you get and how it
> > compares against your current process. Although I'd expect
> > your indexwriter code to barf if you had file locking issues
> > and couldn't empty the index, I suppose it's possible....
>
>
> That's a good solution if all my other attempts fail :)
>
>
>
> > On 8/14/06, Ronald Wildenberg <r.wildenberg@kennisnet.org> wrote:
> > >
> > > Hi,
> > >
> > > I'm experiencing the problem that my index does not seem to be
> > > recreated, despite using the correct flags. The result is that
> > > documents that represent equal database rows occur multiple
> > times in
> > > the index. I recreate my entire index each night.
> > >
> > > My IndexDirectory/IndexWriter construction code looks like this:
> > >
> > >    File indexDirectory = new File(indexPath);
> > >    FSDirectory luceneIndexDirectory =
> > > FSDirectory.getDirectory(indexDirectory, true);
> > >    IndexWriter indexWriter = new IndexWriter(luceneIndexDirectory,
> > > analyzer, true);
> > >
> > > This code should take care of recreating my index, but it does not
> > > seem to be working properly. It looks like the old index is not
> > > removed and the same documents are added to my index again.
> > >
> > > I have strong reasons to not suspect other code to add duplicate
> > > documents. First, if no index has yet been created, no duplicate
> > > documents are added. Second, if an old index does exist, after
> > > recreating the index all documents exist exactly twice (and the
> > > following night they exist three times, etc.). It is not
> > the case that
> > > some documents are duplicated.
> > >
> > > Does anyone have any ideas?
> > >
> > > Thanks in advance,
> > > Ronald.
> > >
> > >
> > > DISCLAIMER:
> > >
> > > Dit bericht (met bijlagen) is met grote zorgvuldigheid
> > samengesteld.
> > > Voor mogelijke onjuistheid en/of onvolledigheid van de hierin
> > > verstrekte informatie kan Kennisnet geen aansprakelijkheid
> > aanvaarden,
> > > evenmin kunnen aan de inhoud van dit bericht (met bijlagen) rechten
> > > worden ontleend. De inhoud van dit bericht (met bijlagen) kan
> > > vertrouwelijke informatie bevatten en is uitsluitend
> > bestemd voor de
> > > geadresseerde van dit bericht. Indien u niet de beoogde
> > ontvanger van
> > > dit bericht bent, verzoekt Kennisnet u dit bericht te verwijderen,
> > > eventuele bijlagen niet te openen en wijst Kennisnet u op de
> > > onrechtmatigheid van het gebruiken, kopiƫren of verspreiden
> > van de inhoud van dit bericht (met bijlagen).
> > >
> > > This message (with attachments) is given in good faith. Kennisnet
> > > cannot assume any responsibility for the accuracy or reliability of
> > > the information contained in this message (with attachments), nor
> > > shall the information be construed as constituting any
> > obligation on
> > > the part of Kennisnet. The information contained in this
> > message (with
> > > attachments) may be confidential or privileged and is only intended
> > > for the use of the named addressee. If you are not the intended
> > > recipient, you are requested by Kennisnet to delete this
> > message (with
> > > attachments) without opening it and you are notified by
> > Kennisnet that
> > > any disclosure, copying or distribution of the information
> > contained
> > > in this message (with attachments) is strictly prohibited
> > and unlawful.
> > >
> > >
> > >
> >
>
>
> DISCLAIMER:
>
> Dit bericht (met bijlagen) is met grote zorgvuldigheid samengesteld. Voor
> mogelijke onjuistheid en/of onvolledigheid van de hierin verstrekte
> informatie kan Kennisnet geen aansprakelijkheid aanvaarden, evenmin kunnen
> aan de inhoud van dit bericht (met bijlagen) rechten worden ontleend. De
> inhoud van dit bericht (met bijlagen) kan vertrouwelijke informatie bevatten
> en is uitsluitend bestemd voor de geadresseerde van dit bericht. Indien u
> niet de beoogde ontvanger van dit bericht bent, verzoekt Kennisnet u dit
> bericht te verwijderen, eventuele bijlagen niet te openen en wijst Kennisnet
> u op de onrechtmatigheid van het gebruiken, kopiƫren of verspreiden van de
> inhoud van dit bericht (met bijlagen).
>
> This message (with attachments) is given in good faith. Kennisnet cannot
> assume any responsibility for the accuracy or reliability of the information
> contained in this message (with attachments), nor shall the information be
> construed as constituting any obligation on the part of Kennisnet. The
> information contained in this message (with attachments) may be confidential
> or privileged and is only intended for the use of the named addressee. If
> you are not the intended recipient, you are requested by Kennisnet to delete
> this message (with attachments) without opening it and you are notified by
> Kennisnet that any disclosure, copying or distribution of the information
> contained in this message (with attachments) is strictly prohibited and
> unlawful.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message