lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?
Date Tue, 30 Jul 2013 12:41:29 GMT
I think that's ~ 110 billion, not trillion, tokens :)

Are you certain you don't have any term vectors?

Even if your index has no term vectors, CheckIndex goes through all
docIDs trying to load them, but that ought to be very fast, and then
you should see "test: doc values..." after that.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 29, 2013 at 4:30 PM, Tom Burton-West <tburtonw@umich.edu> wrote:
> We have very large indexes, almost a terabyte for a single index, and
> normally it takes overnight to run a checkindex.   I started a CheckIndex
> on Friday and today (Monday) it seems to be stuck testing vectors although
> we haven't got vectors turned on. (See below)
> The output file was last written Jul 27 02:28,
> Note that in this 750 GB segment we have about  83 million docs with about
> 2.4 billion unique terms and about 110 trillion tokens.
>
> Have we hit a new CheckIndex limit?
>
>
> Tom
>
> -----------------------
>
>
> Opening index @ /htsolr/lss-dev/solrs/4.2/3/core/data/index
>
> Segments file=segments_e numSegments=2 version=4.2.1 format=
> userData={commitTimeMSec=1374712392103}
>   1 of 2: name=_bch docCount=82946896
>     codec=Lucene42
>     compound=false
>     numFiles=12
>     size (MB)=752,005.689
>     diagnostics = {timestamp=1374657630506, os=Linux,
> os.version=2.6.18-348.12.1.el5, mergeFactor=16, source=merge,
> lucene.version=4.2.1 1461071 - mark - 2013-03-26 08:23:34, os.arch=amd64,
> mergeMaxNumSegments=2, java.version=1.6.0_16, java.vendor=Sun Microsystems
> Inc.}
>     no deletions
>     test: open reader.........OK
>     test: fields..............OK [12 fields]
>     test: field norms.........OK [3 fields]
>     test: terms, freq, prox...OK [2442919802 terms; 73922320413 terms/docs
> pairs; 109976572432 tokens]
>     test: stored fields.......OK [960417844 total field count; avg 11.579
> fields per doc]
>     test: term vectors........
> ~

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message