Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 57720 invoked from network); 16 Apr 2010 18:42:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Apr 2010 18:42:11 -0000 Received: (qmail 42174 invoked by uid 500); 16 Apr 2010 18:42:10 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 42134 invoked by uid 500); 16 Apr 2010 18:42:10 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 42127 invoked by uid 99); 16 Apr 2010 18:42:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 18:42:10 +0000 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.211.3.201] (HELO itcs-ehub-01.adsroot.itcs.umich.edu) (141.211.3.201) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Apr 2010 18:42:04 +0000 Received: from ITCS-ECLS-1-VS3.adsroot.itcs.umich.edu ([141.211.3.233]) by itcs-ehub-01.adsroot.itcs.umich.edu ([141.211.3.201]) with mapi; Fri, 16 Apr 2010 14:41:42 -0400 From: "Burton-West, Tom" To: "java-dev@lucene.apache.org" Date: Fri, 16 Apr 2010 14:41:40 -0400 Subject: RE: Fix to contrib/misc/HighFreqTerms.java Thread-Topic: Fix to contrib/misc/HighFreqTerms.java Thread-Index: AcrcC7WLwG9lkHUhQvuEZjup8/sZdQBh7Mvg Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hi Mike, Thanks for making the fix and changing the display from bytes to utf8. It = needs a very minor change: The latest fix converts to utf8 if you give a field argument on the command= line but still shows bytes if you don't. Line 89 should parallel line 70 and use term.utf8ToString() instead of term= .toString; 70 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString(= )), termsEnum.docFreq())); 89 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), = terms.docFreq())); Tom -----Original Message----- From: Michael McCandless [mailto:lucene@mikemccandless.com]=20 Sent: Wednesday, April 14, 2010 3:50 PM To: java-dev@lucene.apache.org Subject: Re: Bug in contrib/misc/HighFreqTerms.java? OK I committed the fix. I ran it on a flex wikipedia index I had... it produces output like this: body:[3c 21 2d 2d] 509050 body:[73 68 6f 75 6c 64] 515495 body:[74 68 65 6e] 525176 body:[74 69 74 6c 65] 525361 body:[5b 5b 55 6e 69 74 65 64] 532586 body:[6b 6e 6f 77 6e] 533558 body:[75 6e 64 65 72] 536480 body:[55 6e 69 74 65 64] 543746 Which is not very readable, but, it does this because flex terms are arbitrary byte[], not necessarily utf8... maybe we should fix it to print both hex and String if we assume bytes are utf8? Mike On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless wrote: > Ugh, I'll fix this. > > With the new flex API, you can't ask a composite (Multi/DirReader) for > its postings -- you have to go through the static methods on > MultiFields. =A0I'm trying to put some distance b/w IndexReader and > composite readers... because I'd like to eventually deprecate them. > Ie, the composite readers should "hold" an ordered collection of > sub-readers, but should not themselves implement IndexReader's API, I > think. > > Thanks for raising this Tom, > > Mike > > On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom wr= ote: >> When I try to run HighFreqTerms.java in Lucene Revision: 933722=A0 I get= the >> the exception appended below.=A0 I believe the line of code involved is = a >> result of the flex indexing merge. Should I post this as a comment to >> LUCENE-2370 (Reintegrate flex branch into trunk)? >> >> Or is there simply something wrong with my configuration? >> >> Exception in thread "main" java.lang.UnsupportedOperationException: plea= se >> use MultiFields.getFields if you really need a top level Fields (NOTE th= at >> it's usually better to work per segment instead) >> =A0=A0=A0=A0=A0=A0=A0 at >> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762) >> =A0=A0=A0=A0=A0=A0=A0 at org.apache.lucene.misc.HighFreqTerms.main(HighF= reqTerms.java:71) >> >> Tom Burton-West >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org