lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Merging of index in Solr
Date Wed, 22 Nov 2017 15:42:17 GMT
Hi Emir,

Yes, I am running the merging on a Windows machine.
The hard disk is a SSD disk in NTFS file system.

Regards,
Edwin

On 22 November 2017 at 16:50, Emir Arnautović <emir.arnautovic@sematext.com>
wrote:

> Hi Edwin,
> Quick googling suggests that this is the issue of NTFS related to large
> number of file fragments caused by large number of files in one directory
> of huge files. Are you running this merging on a Windows machine?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Nov 2017, at 02:33, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I have encountered this error during the merging of the 3.5TB of index.
> > What could be the cause that lead to this?
> >
> > Exception in thread "main" Exception in thread "Lucene Merge Thread #8"
> > java.io.
> >
> > IOException: background merge hit exception: _6f(6.5.1):C7256757
> > _6e(6.5.1):C646
> >
> > 2072 _6d(6.5.1):C3750777 _6c(6.5.1):C2243594 _6b(6.5.1):C1015431
> > _6a(6.5.1):C105
> >
> > 0220 _69(6.5.1):c273879 _28(6.4.1):c79011/84:delGen=84
> > _26(6.4.1):c44960/8149:de
> >
> > lGen=100 _29(6.4.1):c73855/68:delGen=68 _5(6.4.1):C46672/31:delGen=31
> > _68(6.5.1)
> >
> > :c66 into _6g [maxNumSegments=1]
> >
> >        at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1931)
> >
> >
> >
> >        at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1871)
> >
> >
> >
> >        at
> > org.apache.lucene.misc.IndexMergeTool.main(IndexMergeTool.java:57)
> >
> > Caused by: java.io.IOException: The requested operation could not be
> > completed d
> >
> > ue to a file system limitation
> >
> >        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> >
> >        at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)
> >
> >        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> >
> >        at sun.nio.ch.IOUtil.write(Unknown Source)
> >
> >        at sun.nio.ch.FileChannelImpl.write(Unknown Source)
> >
> >        at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
> >
> >        at java.nio.channels.Channels.writeFully(Unknown Source)
> >
> >        at java.nio.channels.Channels.access$000(Unknown Source)
> >
> >        at java.nio.channels.Channels$1.write(Unknown Source)
> >
> >        at
> > org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory
> >
> > .java:419)
> >
> >        at java.util.zip.CheckedOutputStream.write(Unknown Source)
> >
> >        at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> >
> >        at java.io.BufferedOutputStream.write(Unknown Source)
> >
> >        at
> > org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStre
> >
> > amIndexOutput.java:53)
> >
> >        at
> > org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimited
> >
> > IndexOutput.java:73)
> >
> >        at org.apache.lucene.store.DataOutput.writeBytes(
> DataOutput.java:52)
> >
> >        at
> > org.apache.lucene.codecs.lucene50.ForUtil.writeBlock(ForUtil.java:175
> >
> > )
> >
> >        at
> > org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.addPosition(
> >
> > Lucene50PostingsWriter.java:286)
> >
> >        at
> > org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPosting
> >
> > sWriterBase.java:156)
> >
> >        at
> > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.w
> >
> > rite(BlockTreeTermsWriter.java:866)
> >
> >        at
> > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTr
> >
> > eeTermsWriter.java:344)
> >
> >        at
> > org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105
> >
> > )
> >
> >        at
> > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter
> >
> > .merge(PerFieldPostingsFormat.java:164)
> >
> >        at
> > org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:2
> >
> > 16)
> >
> >        at
> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
> >
> >        at
> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4353
> >
> > )
> >
> >        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.
> java:3928)
> >
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMe
> >
> > rgeScheduler.java:624)
> >
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
> >
> > urrentMergeScheduler.java:661)
> >
> > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> > The req
> >
> > uested operation could not be completed due to a file system limitation
> >
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException
> >
> > (ConcurrentMergeScheduler.java:703)
> >
> >        at
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
> >
> > urrentMergeScheduler.java:683)
> >
> > Caused by: java.io.IOException: The requested operation could not be
> > completed d
> >
> > ue to a file system limitation
> >
> >        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> >
> >        at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)
> >
> >        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> >
> >        at sun.nio.ch.IOUtil.write(Unknown Source)
> >
> >        at sun.nio.ch.FileChannelImpl.write(Unknown Source)
> >
> >        at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
> >
> >        at java.nio.channels.Channels.writeFully(Unknown Source)
> >
> >        at java.nio.channels.Channels.access$000(Unknown Source)
> >
> >        at java.nio.channels.Channels$1.write(Unknown Source)
> >
> > Regards,
> > Edwin
> >
> > On 22 November 2017 at 00:10, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> > wrote:
> >
> >> I am using the IndexMergeTool from Solr, from the command below:
> >>
> >> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
> >> org.apache.lucene.misc.IndexMergeTool
> >>
> >> The heap size is 32GB. There are more than 20 million documents in the
> two
> >> cores.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >>
> >> On 21 November 2017 at 21:54, Shawn Heisey <apache@elyograg.org> wrote:
> >>
> >>> On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
> >>>
> >>>> Does anyone knows how long usually the merging in Solr will take?
> >>>>
> >>>> I am currently merging about 3.5TB of data, and it has been running
> for
> >>>> more than 28 hours and it is not completed yet. The merging is
> running on
> >>>> SSD disk.
> >>>>
> >>>
> >>> The following will apply if you mean Solr's "optimize" feature when you
> >>> say "merging".
> >>>
> >>> In my experience, merging proceeds at about 20 to 30 megabytes per
> second
> >>> -- even if the disks are capable of far faster data transfer.  Merging
> is
> >>> not just copying the data. Lucene is completely rebuilding very large
> data
> >>> structures, and *not* including data from deleted documents as it does
> so.
> >>> It takes a lot of CPU power and time.
> >>>
> >>> If we average the data rates I've seen to 25, then that would indicate
> >>> that an optimize on a 3.5TB is going to take about 39 hours, and might
> take
> >>> as long as 48 hours.  And if you're running SolrCloud with multiple
> >>> replicas, multiply that by the number of copies of the 3.5TB index.  An
> >>> optimize on a SolrCloud collection handles one shard replica at a time
> and
> >>> works its way through the entire collection.
> >>>
> >>> If you are merging different indexes *together*, which a later message
> >>> seems to state, then the actual Lucene operation is probably nearly
> >>> identical, but I'm not really familiar with it, so I cannot say for
> sure.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message