lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: Fix to contrib/misc/HighFreqTerms.java
Date Fri, 16 Apr 2010 18:41:40 GMT
Hi Mike,

Thanks for making the fix and changing the display from bytes to utf8.  It needs a very minor
change:
The latest fix converts to utf8 if you give a field argument on the command line but still
shows bytes if you don't.

Line 89 should parallel line 70 and use term.utf8ToString() instead of term.toString;

70 	 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
89 	 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));

Tom

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Wednesday, April 14, 2010 3:50 PM
To: java-dev@lucene.apache.org
Subject: Re: Bug in contrib/misc/HighFreqTerms.java?

OK I committed the fix.  I ran it on a flex wikipedia index I had...
it produces output like this:

body:[3c 21 2d 2d] 509050
body:[73 68 6f 75 6c 64] 515495
body:[74 68 65 6e] 525176
body:[74 69 74 6c 65] 525361
body:[5b 5b 55 6e 69 74 65 64] 532586
body:[6b 6e 6f 77 6e] 533558
body:[75 6e 64 65 72] 536480
body:[55 6e 69 74 65 64] 543746

Which is not very readable, but, it does this because flex terms are
arbitrary byte[], not necessarily utf8... maybe we should fix it to
print both hex and String if we assume bytes are utf8?

Mike

On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Ugh, I'll fix this.
>
> With the new flex API, you can't ask a composite (Multi/DirReader) for
> its postings -- you have to go through the static methods on
> MultiFields.  I'm trying to put some distance b/w IndexReader and
> composite readers... because I'd like to eventually deprecate them.
> Ie, the composite readers should "hold" an ordered collection of
> sub-readers, but should not themselves implement IndexReader's API, I
> think.
>
> Thanks for raising this Tom,
>
> Mike
>
> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tburtonw@umich.edu> wrote:
>> When I try to run HighFreqTerms.java in Lucene Revision: 933722  I get the
>> the exception appended below.  I believe the line of code involved is a
>> result of the flex indexing merge. Should I post this as a comment to
>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>
>> Or is there simply something wrong with my configuration?
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>> it's usually better to work per segment instead)
>>         at
>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>>         at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>
>> Tom Burton-West
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message