lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?
Date Mon, 29 Jul 2013 20:30:33 GMT
We have very large indexes, almost a terabyte for a single index, and
normally it takes overnight to run a checkindex.   I started a CheckIndex
on Friday and today (Monday) it seems to be stuck testing vectors although
we haven't got vectors turned on. (See below)
The output file was last written Jul 27 02:28,
Note that in this 750 GB segment we have about  83 million docs with about
2.4 billion unique terms and about 110 trillion tokens.

Have we hit a new CheckIndex limit?


Tom

-----------------------


Opening index @ /htsolr/lss-dev/solrs/4.2/3/core/data/index

Segments file=segments_e numSegments=2 version=4.2.1 format=
userData={commitTimeMSec=1374712392103}
  1 of 2: name=_bch docCount=82946896
    codec=Lucene42
    compound=false
    numFiles=12
    size (MB)=752,005.689
    diagnostics = {timestamp=1374657630506, os=Linux,
os.version=2.6.18-348.12.1.el5, mergeFactor=16, source=merge,
lucene.version=4.2.1 1461071 - mark - 2013-03-26 08:23:34, os.arch=amd64,
mergeMaxNumSegments=2, java.version=1.6.0_16, java.vendor=Sun Microsystems
Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [12 fields]
    test: field norms.........OK [3 fields]
    test: terms, freq, prox...OK [2442919802 terms; 73922320413 terms/docs
pairs; 109976572432 tokens]
    test: stored fields.......OK [960417844 total field count; avg 11.579
fields per doc]
    test: term vectors........
~

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message