lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: bytecount as prefix
Date Wed, 12 Apr 2006 03:07:30 GMT

1) not only does ConstantScoreRangeQuery uses a RangeFilter, but
TestConstantScoreRangeQuery and TestRangeFilter share a base class that
creates the index.

2) perhaps the issue is that corruption is happening when segments are
merged -- and most tests don't surface the problem becuse they tend to
operate on small simple indexes of one segment?

One thing i remember about the base class for those RangeFIlter
tests is that it makes an index with several thousand docs -- enough that
the default indexer options are probably making/merging more then a few
segments.

i don't know anythign about TestIndexModifier, but if i remember correctly
INdexModifier manages a reader and a writer and opens/closes them as
needed to do whatever operation you wnat -- so i'm guessing it's test
would open/close a writer several times while adding docs, which may make
multiple segments .. and if it does and optimize that would definitely
merge thosue segments.

i would start by twidling the RangeFilter test base class to do a much
smaller number of documents .. if that fixes the problem, then try chaging
the merge factor and min merge docs to be really low, and if that causes
the problem again, you'll be on to something.

you could probably make a simple test case where you add some docs to an
indexwriter (with a coupld of fields that have multibyte characters),
reopen the writer, add some more docs (ditto), then open a TermEnum
and record every term in the index, then optimize the index, and then open
a new TermEnum and assert that every term matches ... I'm guessing that
would fail for you at teh moment. (but work against the trunk)



: Date: Tue, 11 Apr 2006 16:49:18 -0700
: From: Marvin Humphrey <marvin@rectangular.com>
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Re: bytecount as prefix
:
:
: On Apr 11, 2006, at 12:05 PM, Marvin Humphrey wrote:
:
: >  TestRangeFilter.
:
: A phantom blank Term shows up out of nowhere in the middle of the
: merge process.
:
: When you stick a System.err.println into TermInfosWriter's writeTerm,
: you ordinarily see it adding Terms in proper sort order:
:
:      [junit] TINFO: :
:      [junit] TINFO: body:body
:      [junit] TINFO: id:000000000000
:      [junit] TINFO: rand:-00953139433
:      [junit] TINFO: :
:      [junit] TINFO: body:body
:      [junit] TINFO: id:000000000001
:      [junit] TINFO: rand:000015869780
:
: Here's several docs being merged together:
:
:      [junit] TINFO: :
:      [junit] TINFO: body:body
:      [junit] TINFO: id:000000000009
:      [junit] TINFO: rand:-00563669564
:      [junit] TINFO: :
:      [junit] TINFO: body:body
:      [junit] TINFO: id:000000000000
:      [junit] TINFO: id:000000000001
:      [junit] TINFO: id:000000000002
:      [junit] TINFO: id:000000000003
:      [junit] TINFO: id:000000000004
:      [junit] TINFO: id:000000000005
:      [junit] TINFO: id:000000000006
:      [junit] TINFO: id:000000000007
:      [junit] TINFO: id:000000000008
:      [junit] TINFO: id:000000000009
:      [junit] TINFO: rand:-00072576061
:      [junit] TINFO: rand:-00260794310
:      [junit] TINFO: rand:-00563669564
:      [junit] TINFO: rand:-00953139433
:      [junit] TINFO: rand:-01094000683
:      [junit] TINFO: rand:-01481464619
:      [junit] TINFO: rand:-02099458317
:      [junit] TINFO: rand:000015869780
:      [junit] TINFO: rand:001019870061
:      [junit] TINFO: rand:001565603387
:      [junit] TINFO: :
:      [junit] TINFO: body:body
:      [junit] TINFO: id:000000000010
:      [junit] TINFO: rand:001271292228
:
: At some point, late in the merge process, this happens:
:
:      [junit] TermInfosWriter: rand:-00449774276
:      [junit] TermInfosWriter: rand:-00467363681
:      [junit] TermInfosWriter: rand:-00479945420
:      [junit] TermInfosWriter: rand:-00506239929
:      [junit] TermInfosWriter: :                  // Huh????
:      [junit] TermInfosWriter: rand:-00512006124
:      [junit] TermInfosWriter: rand:-00526876979  // <- look at this
: number
:      [junit] TermInfosWriter: rand:-00531589361
:      [junit] TermInfosWriter: rand:-00563669564
:      [junit] TermInfosWriter: rand:-00638261924
:
: Here's the first few terms coming off of a Term Enum, later.  As you
: can see, the sort order is messed up.  That's because the .tis stream
: has gotten out of sync somehow.
:
:      [junit] TERMS:
:      [junit] rand:26876979  // <- the last few digits of that number
: from earlier
:      [junit] rand:31589361
:      [junit] rand:63669564
:      [junit] rand:638261924
:      [junit] rand:733778983
:      [junit] rand:770310547
:      [junit] rand:806409190
:      [junit] rand:849606785
:      [junit] rand:869935672
:      [junit] rand:927974448
:      [junit] rand:953139433
:      [junit] rand:954514004
:      [junit] rand:961290394
:      [junit] rand:1067018129
:      [junit] rand:1081398108
:      [junit] rand:1094000683
:      [junit] rand:1139978555
:      [junit] rand:1231799109
:
: I'm stumped for now.
:
: Marvin Humphrey
: Rectangular Research
: http://www.rectangular.com/
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-dev-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message