Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 141BC10503 for ; Mon, 23 Jun 2014 14:02:49 +0000 (UTC) Received: (qmail 95646 invoked by uid 500); 23 Jun 2014 14:02:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95581 invoked by uid 500); 23 Jun 2014 14:02:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95567 invoked by uid 99); 23 Jun 2014 14:02:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jun 2014 14:02:46 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nbuso@ebi.ac.uk designates 193.62.194.225 as permitted sender) Received: from [193.62.194.225] (HELO smtp02.ebi.ac.uk) (193.62.194.225) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jun 2014 14:02:43 +0000 Received: from [172.22.70.44] (linuxf.windows.ebi.ac.uk [172.22.70.44]) by smtp02.ebi.ac.uk (8.13.8/8.13.8) with ESMTP id s5NE16Ra028160; Mon, 23 Jun 2014 15:01:06 +0100 Message-ID: <1403532066.2121.31.camel@linuxf.windows.ebi.ac.uk> Subject: Re: Facet migration 4.6.1 to > 4.7.0 From: Nicola Buso Reply-To: nbuso@ebi.ac.uk To: Shai Erera Cc: "java-user@lucene.apache.org" Date: Mon, 23 Jun 2014 15:01:06 +0100 In-Reply-To: References: <1402935320.2215.16.camel@linuxf.windows.ebi.ac.uk> <1403015438.2215.58.camel@linuxf.windows.ebi.ac.uk> Organization: EMBL-EBI Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4 (3.10.4-2.fc20) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Tue, 2014-06-17 at 17:51 +0300, Shai Erera wrote: > - we are extending FacetResultsHandler to change the order of > the facet > results (i.e. date facets ordered by date instead of count). > How can I > achieve this now? > > > Now everything is a Facets. In your case, since you use the taxonomy, > it's TaxonomyFacets. You can check the class-hierarchy, where you have > IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and > FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets, > or just TaxonomyFacets. Then if you ask for the 'date' dimension, > delegate to the one that sorts by the date value, otherwise to the > default one? > > > When you say you sort by date, do you count the topN and then sort > them by date, or you sort by date the entire dimension and then return > topN? If the latter, does it mean you resolve each ordinal to its Date > value to sort by? It might be a bit expensive to resolve that ... I > wonder if you could do that w/ a NumericDocValues too ... e.g. add > Year, Month, Day numeric DV fields, then aggregate by their value > instead of resolving them to ordinals ... it's probably more involved > than that, i.e. counting 2013/March is more complicated, but there's > got to be a solution, like maybe ask to count March, but filter the > query by year:2013 ... need to think about that. I had an abstract implementation of FacetResultsHandler that was permitting to the extenders to provide their own PriorityQueue that was ordering in my case by label instead of value; the previous API in the code was working with and instance of PriorityQueue and FacetResultNode was a better container of information compare to OrdAndValue (at least for my case). I probably need to reimplement again this part. > > - we have usual IndexReaders opened in groups with > MultiReader, than we're > merging in RAM the TaxonomyReaders to obtain a correspondence > of the > MultiReader for the taxonomies. Do you think I can still do > this? > > The taxonomy in general hasn't changed. Besides CategoryPath which was > replaced by String[], it's more or less the same. OK I will try to adapt this part > > - at some point you removed the residue information from > facets and we > calculated it differently; am I right I can now calculate it > as > FacetResult.childCount - FacetResult.labelValues.length? > > > If the residue is the number of children that had counts>0 but are not > in the topN, then yes, the above computation seems right. > FR.childCount denotes how many child labels were encountered, while > FR.labelValues.length is <= N, where N is topN that you ask to count. Yes, your assumption is right I already sorted out this part > > > - we are extending TaxonomyFacetsAccumulator to provide: > - specific FacetResultsHandler(s) depeding on the facet > - add facet other than the topk if the user selected some > facet values > from the "residue". > where does the API permit to extends the behavior to achieve > this? > > > FacetsCollector hasn't changed much and returns a List. > The entire additional chain (Accumulator, ResultHandler etc.) is now a > Facets. So you basically either need to extend Facets (or > TaxonomyFacets), or write your own class which just processes the > List. > > There's no "right way" to do it, it depends on what you want to > achieve. If its e.g. the different sort-order (date vs other), I would > try to extend one of the existing classes (IntTaxoFacets). If it's > something completely different, e.g. RangeFacetCounts, you should be > able to just extend Facets. And if it's not a "Facets" thing at all, > i.e. you don't need its API, just write your own interface to process > the list of MatchingDocs. > > Hope that helps > > > Shai Nicola. > > > > On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso wrote: > Hi, > > I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some > Facet API > changes happened on 4.7.0 probably mostly related to this > ticket: > http://issues.apache.org/jira/browse/LUCENE-5339 > > Here are few question about some customization/extension we > did and > seem not having a direct counterpart/extension point in the > new API; > can someone help with these questions? > > - we are extending FacetResultsHandler to change the order of > the facet > results (i.e. date facets ordered by date instead of count). > How can I > achieve this now? > > - we have usual IndexReaders opened in groups with > MultiReader, than we're > merging in RAM the TaxonomyReaders to obtain a correspondence > of the > MultiReader for the taxonomies. Do you think I can still do > this? > > - at some point you removed the residue information from > facets and we > calculated it differently; am I right I can now calculate it > as > FacetResult.childCount - FacetResult.labelValues.length? > > - we are extending TaxonomyFacetsAccumulator to provide: > - specific FacetResultsHandler(s) depeding on the facet > - add facet other than the topk if the user selected some > facet values > from the "residue". > where does the API permit to extends the behavior to achieve > this? > > > Any help will be really apreciated, > > > > Nicola. > > > > -- > Nicola Buso > Software Engineer - Web Production Team > > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > United Kingdom > > URL: http://www.ebi.ac.uk > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: > java-user-help@lucene.apache.org > > > -- Nicola Buso Software Engineer - Web Production Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom URL: http://www.ebi.ac.uk --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org