lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Generate Lucene segments_N file
Date Wed, 10 Feb 2016 14:16:51 GMT
I'm glad you iterated so quickly to a working solution!

I like your comment about bootstrapping to get the right segment ID :)
 There are also APIs in CodecUtil.java to read this from the index
header from any of the segment's files, for future reference.

Good job figuring out the info.setCodec call ;)

And thank you for sharing your solution so future unlucky googlers can benefit.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 10, 2016 at 6:17 AM,  <khanh-lam.mai@bnf.fr> wrote:
> Hi,
>
> Mike, thanks a lot for your help, I adapt your code and it actually works
> great !
> Thanks for saving us weeks of work.
>
> Here is my code, if it could help someone else :
>
>
> package org.apache.lucene.index;
>
> import java.io.IOException;
> import java.nio.file.Path;
> import java.nio.file.Paths;
>
> import org.apache.lucene.codecs.Codec;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.IOContext;
> import org.apache.lucene.store.SimpleFSDirectory;
>
> public class GenSegmentInfo {
>         public static void main(String[] args) throws IOException {
>                 Codec codec = Codec.getDefault();
>                 Path myPath = Paths.get("/tmp/index");
>                 Directory directory = new SimpleFSDirectory(myPath);
>
>                 //launch this the first time with random segmentID value
>                 //then with java debug, get the right segment ID
>                 //by putting a breakpoint on
> CodecUtil#checkIndexHeaderID(...)
>                 byte[] segmentID = {88, 55, 58, 78, -21, -55, 102, 99,
> 123, 34, 85, -38, -70, -120, 102, -67};
>
>                 SegmentInfo info =
> codec.segmentInfoFormat().read(directory, "_1rpt",
>                                 segmentID, IOContext.READ);
>                 info.setCodec(codec);
>                 SegmentInfos infos = new SegmentInfos();
>                 SegmentCommitInfo commit = new SegmentCommitInfo(info, 1,
> -1, -1, -1);
>                 infos.add(commit);
>                 infos.commit(directory);
>         }
> }
>
>
> Regards,
>
> Khanh-Lam Mai
>
>
>
>
> De :    Michael McCandless <lucene@mikemccandless.com>
> A :     Lucene Users <java-user@lucene.apache.org>, khanh-lam.mai@bnf.fr
> Date :  10/02/2016 10:17
> Objet : Re: Generate Lucene segments_N file
>
>
>
> It'd be a challenge, but it is possible.  It's just software ;)
>
> You need something like this to read a SegmentInfo from your sole .si
> file, assuming you are on a recent 5.x release:
>
>   SegmentInfo info = codec.segmentInfoFormat().read(directory,
> segName, segmentID, IOContext.READ);
>
> To get codec, assuming you used the default codec for indexing, use:
>
>   Codec codec = Codec.getDefault();
>
> Then do something like this:
>
>   SegmentInfos infos = new SegmentInfos();
>   infos.add(info);
>   infos.commit(directory);
>
> The latter method is package private, so your tool must live in
> org.apache.lucene.index package, or use break-out-of-jail magic with
> Java's reflection APIs.
>
> Then run CheckIndex on that ... if it fails, iterate with the above code!
>
> Good luck,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Feb 9, 2016 at 9:50 AM,  <khanh-lam.mai@bnf.fr> wrote:
>> Hello,
>>
>> First, I don't know if it's the right mailing list to ask for your help,
>> if no please accept my apologies for the inconvenience.
>>
>> While moving Lucene (5.3) index files from a server to an other, I
> forgot
>> to move the segments_N file (because I use the pattern *.*)
>> Unfortunately I've erased the original folder, and I only have these
> files
>> in my directory now :
>>
>> _1rpt.fdt
>> _1rpt.fdx
>> _1rpt.fnm
>> _1rpt.nvd
>> _1rpt.nvm
>> _1rpt.si
>> _1rpt_Lucene50_0.doc
>> _1rpt_Lucene50_0.dvd
>> _1rpt_Lucene50_0.dvm
>> _1rpt_Lucene50_0.pos
>> _1rpt_Lucene50_0.tim
>> _1rpt_Lucene50_0.tip
>> write.lock
>>
>> I am missing the segments_42u file, and without it I cannot even do an
>> org.apache.lucene.index.CheckIndex :
>>
>> Exception in thread "main"
> org.apache.lucene.index.IndexNotFoundException:
>> no segments* file found in MMapDirectory@/solr-5.3.1
>> /nodes/node1/core/data/index lockFactory=org.apache.lucene.store.
>> NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx,
>> _1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc,
>> _1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos,
>> _1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip]
>> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483)
>> at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
>> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)
>>
>> The index is pretty huge (> 800GB) and it will take weeks to rebuild it.
>> Is there a way to generate this missing segment info file ?
>>
>> Thanks a lot for your help.
>>
>>
>> Khanh-Lam Mai
>> khanh-lam.mai@bnf.fr
>> Exposition  De Rouge et de Noir. Les vases grecs de la collection de
> Luynes  - jusqu'au 1 er  mars 2016 - BnF - Richelieu Avant d'imprimer,
> pensez à l'environnement.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> Exposition  De Rouge et de Noir. Les vases grecs de la collection de Luynes  - jusqu'au
31 mars 2016 - BnF - Richelieu Avant d'imprimer, pensez à l'environnement.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message