lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller" <markrmil...@gmail.com>
Subject Re: Delete corrupted doc
Date Thu, 26 Jul 2007 17:52:46 GMT
>From what I can tell, you shouldn't need to even try my first suggestion
(what happened to the experts on this question by the way?).

Returning true from isDeleted for the corrupt id should not matter.

It appears to me that deletes are handled by keeping a simple list of the
id's that are deleted. When a merge is done and a new segment is created,
the deleted ids are just not brought along for the ride. Instead, just docs
0-maxdoc() with isDeleted=false are put into the new segment. Your corrupt
id that is greater than maxdoc() should not make the new segment as it will
never be retrieved.

Anyway, what this says to me (and I should have realized this before) is
that there is no document with your corrupt id, rather there is a term that
thinks it is in that invalid doc id. The corruption must be in the
term:docids inverted index.

Getting that invalid number out of that file might be rather difficult.
There are some brilliant guys on the list that might have an idea how to do
it though. Certainly my approach in that first e-mail will not do it.

I will try to think of something if no one chimes in. Obviously, re-index
will be the easiest solution <g>

- Mark

On 7/26/07, Rafael Rossini <rafael.rossini@gmail.com> wrote:
>
> Yes, I optimized, but in the with SOLR. I don´t know why, but when
> optimize
> an index with SOLR, it leaves you with about 15 files, instead of the 3...
> I´ll try to optimize directly on lucene, and see what happens, if nothing
> happens I´ll try your suggestion. Thanks a lot Mark!!
>
> On 7/26/07, Mark Miller <markrmiller@gmail.com> wrote:
> >
> > You know, on second though, a merge shouldn't even try to access a doc >
> > maxdoc (i think). Have you just tried an optimize?
> >
> > On 7/25/07, Rafael Rossini <rafael.rossini@gmail.com> wrote:
> > >
> > > Hi guys,
> > >
> > >     Is there a way of deleting a document that, because of some
> > > corruption,
> > > got and docID larger than the maxDoc() ? I´m trying to do this but I
> get
> > > this Exception:
> > >
> > > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
> > Array
> > > index out of range: 106577
> > >    at org.apache.lucene.util.BitVector.set(BitVector.java:53)
> > >    at org.apache.lucene.index.SegmentReader.doDelete (
> SegmentReader.java
> > > :301)
> > >    at org.apache.lucene.index.IndexReader.deleteDocument(
> > IndexReader.java
> > > :674)
> > >    at org.apache.lucene.index.MultiReader.doDelete(MultiReader.java
> :125)
> > >    at org.apache.lucene.index.IndexReader.deleteDocument (
> > IndexReader.java
> > > :674)
> > >    at teste.DeleteError.main(DeleteError.java:9)
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message