lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: Facet migration 4.6.1 to > 4.7.0
Date Mon, 23 Jun 2014 14:01:06 GMT
Hi,

On Tue, 2014-06-17 at 17:51 +0300, Shai Erera wrote:
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
> 
> 
> Now everything is a Facets. In your case, since you use the taxonomy,
> it's TaxonomyFacets. You can check the class-hierarchy, where you have
> IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and
> FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets,
> or just TaxonomyFacets. Then if you ask for the 'date' dimension,
> delegate to the one that sorts by the date value, otherwise to the
> default one?
> 
> 
> When you say you sort by date, do you count the topN and then sort
> them by date, or you sort by date the entire dimension and then return
> topN? If the latter, does it mean you resolve each ordinal to its Date
> value to sort by? It might be a bit expensive to resolve that ... I
> wonder if you could do that w/ a NumericDocValues too ... e.g. add
> Year, Month, Day numeric DV fields, then aggregate by their value
> instead of resolving them to ordinals ... it's probably more involved
> than that, i.e. counting 2013/March is more complicated, but there's
> got to be a solution, like maybe ask to count March, but filter the
> query by year:2013 ... need to think about that.

I had an abstract implementation of FacetResultsHandler that was
permitting to the extenders to provide their own PriorityQueue that was
ordering in my case by label instead of value; the previous API in the
code was working with and instance of PriorityQueue<FacetResultNode> and
FacetResultNode was a better container of information compare to
OrdAndValue (at least for my case). I probably need to reimplement again
this part.

> 
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
> 
> The taxonomy in general hasn't changed. Besides CategoryPath which was
> replaced by String[], it's more or less the same.

OK I will try to adapt this part
> 
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
> 
> 
> If the residue is the number of children that had counts>0 but are not
> in the topN, then yes, the above computation seems right.
> FR.childCount denotes how many child labels were encountered, while
> FR.labelValues.length is <= N, where N is topN that you ask to count.

Yes, your assumption is right I already sorted out this part

> 
> 
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
> 
> 
> FacetsCollector hasn't changed much and returns a List<MatchingDocs>.
> The entire additional chain (Accumulator, ResultHandler etc.) is now a
> Facets. So you basically either need to extend Facets (or
> TaxonomyFacets), or write your own class which just processes the
> List<MatchingDocs>.
> 
> There's no "right way" to do it, it depends on what you want to
> achieve. If its e.g. the different sort-order (date vs other), I would
> try to extend one of the existing classes (IntTaxoFacets). If it's
> something completely different, e.g. RangeFacetCounts, you should be
> able to just extend Facets. And if it's not a "Facets" thing at all,
> i.e. you don't need its API, just write your own interface to process
> the list of MatchingDocs.
> 
> Hope that helps
> 
> 
> Shai

Nicola.
> 
> 
> 
> On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso <nbuso@ebi.ac.uk> wrote:
>         Hi,
>         
>         I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some
>         Facet API
>         changes happened on 4.7.0 probably mostly related to this
>         ticket:
>         http://issues.apache.org/jira/browse/LUCENE-5339
>         
>         Here are few question about some customization/extension we
>         did and
>         seem not having a direct counterpart/extension point in the
>         new API;
>         can someone help with these questions?
>         
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
>         
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
>         
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
>         
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
>         
>         
>         Any help will be really apreciated,
>         
>         
>         
>         Nicola.
>         
>         
>         
>         --
>         Nicola Buso
>         Software Engineer - Web Production Team
>         
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         United Kingdom
>         
>         URL: http://www.ebi.ac.uk
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message