Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F02AB9887 for ; Mon, 6 Feb 2012 23:12:38 +0000 (UTC) Received: (qmail 49134 invoked by uid 500); 6 Feb 2012 23:12:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49006 invoked by uid 500); 6 Feb 2012 23:12:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48978 invoked by uid 99); 6 Feb 2012 23:12:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 23:12:35 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [64.78.17.16] (HELO EXHUB018-1.exch018.msoutlookonline.net) (64.78.17.16) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Feb 2012 23:12:29 +0000 Received: from EXVMBX018-1.exch018.msoutlookonline.net ([64.78.17.47]) by EXHUB018-1.exch018.msoutlookonline.net ([64.78.17.16]) with mapi; Mon, 6 Feb 2012 15:12:07 -0800 From: Paul Allan Hill To: "java-user@lucene.apache.org" Date: Mon, 6 Feb 2012 15:12:06 -0800 Subject: RE: recording a universal ID from DocID in a CustomScoreQuery Thread-Topic: recording a universal ID from DocID in a CustomScoreQuery Thread-Index: AczkxhxuYsZRNxK9TZ+tAlFrPSuaiwAXIk9w Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 To complete this thread, I read the document itself with a 1 field fieldSel= ector, so as not to bother with anything but exactly what I needed at this = point in the code (particular not the text body). Then I saved the primary key (the path) of documents that visited this Cust= omScoreQuery (function query) in a Set seenDocs seenDocs.add(reader.document(docId, fieldSelector ).getFiel= dable(KEY_FIELD).stringValue()); If We do introduce a short global unique ID field, the code needs little ch= ange to move to a different field. When the entire query rounded up all the results, It asks the question whic= h ones had come through that function query by consulting the list of seenD= ocs. I decided NOT to use the fieldcache for this particular application, becaus= e the number of documents that are the result of this part of the query are= very small compared to all documents Their rarity was the point of knowing, so that I could mark the result as '= special' for other parts of the application. Such special documents get di= fferent treatment in the UI, but that's not my concern, just IDing which on= es was the useful part for index layer. As usual thanks for the feedback. -Paul > -----Original Message----- > From: Ian Lea [mailto:ian.lea@gmail.com] > Sent: Monday, February 06, 2012 3:54 AM > To: java-user@lucene.apache.org > Subject: Re: recording a universal ID from DocID in a CustomScoreQuery >=20 > int doc will be for the subreader, not for the entire index. > oal.search.Collector has setNextReader(IndexReader reader, int > docBase) which you might somehow be able to use. Failing that I'd go for= FieldCache, or store the > docids in a Set in a Map keyed by current Reader, if that would give you = what you needed for the > subsequent messing around. >=20 >=20 > -- > Ian. >=20 >=20 > On Sat, Feb 4, 2012 at 12:09 AM, Paul Allan Hill wrot= e: > > My Index does NOT have a simple UID, it uses the file PATH to the file = as the unique key. > > I was implementing a CustomScoreQuery which not only tweaked the score = it also wanted to write > down which documents had passed through this part of overall rebuilt quer= y, so that I could further > mess with those particular documents later. > > I was hoping to do it without using loading up all PATHs from my index = into a field cache, but maybe > that is a false way to try to save memory. > > > > I thought I could write down the docId provided in the call to > > customScore > > > > public float customScore(int doc, float subQueryScore, float > > valSrcScore) throws IOException { > > =A0 =A0 docIds.add(docId); > > =A0 return ...; > > =A0} > > > > private Set docIds =3D new HashSet(); > > > > While I thought I had this working, apparently I had not taken into con= sideration the subreader and > segment problem. > > The int called doc is not the docId for the entire index, just the loca= l reader doc number. =A0Is that > right? > > So is there a standard way to convert back to the index wide DocID? > > > > If there is no standard way, I _might_ create a small subclass of Index= Searcher and provide a method > to: > > > > > > (1) =A0 =A0Find the right reader by looping through all > > IndexSearcher.subReaders[] to find what reader called the > > CustomScoreQuery > > > > (2) =A0 =A0Add an offset of the proper value from > > IndexSearcher.docStarts[iReader] > > > > But I'm am thinking this prone to the problem that subreader can be > > made of more subreaders etc., so I really don't have a clue where to fi= nd the current reader and > then to map back to docStarts. > > > > I also think I'm doing this wrong, because ReaderUtil has nothing like = this? > > > > Is there some way to note for later that a particular document came thr= ough this function query or > should I just accept the fact of using the field cache? > > > > -Paul > > > > > > > > >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org