lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris <christu...@gmail.com>
Subject Re: corrupted index Lucene 4.4
Date Wed, 23 Oct 2013 14:58:29 GMT
Actually, it contains about 100 million webpages and was built out of a web
index for NLP processing :(

I did the indexing & crawling over one small sized server....and
researching and getting it all to this stage took me this much time...and
now my index is un-usable :(


On Wed, Oct 23, 2013 at 8:16 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, Oct 23, 2013 at 10:33 AM, Chris <christudas@gmail.com> wrote:
> > I am not exactly sure if the commit() was run, as i am inserting each
> row &
> > doing a commit right away. My solr will not load the index....
>
> I'm confused: if you are doing a commit right away after every row
> (which is REALLY bad practice: that's incredibly slow and
> unnecessary), then surely you've had many commits succeed?
>
> > is there anyway that i can fix this, I have a huge index & will loose
> > months if i try to reindex :( I didnt know lucene was not stable, I
> thought
> > it was
>
> Sorry, but no.
>
> In theory ... a tool could be created that would try to "reconstitute"
> a segments file by looking at all the various files that exist, but
> this is not in general easy (and may not be possible): the segments
> file has very important metadata, like which codec was used to write
> each segment, etc.
>
> Did it really take months to do this indexing?  That is really way too
> long; how many documents?
>
> Lucene (Solr) is stable, i.e. a successful commit should ensure your
> index survives power loss.  If somehow that was not the case here,
> then we need to figure out why and fix it ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message