lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3262) Facet benchmarking
Date Thu, 06 Oct 2011 05:16:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121728#comment-13121728
] 

Shai Erera commented on LUCENE-3262:
------------------------------------

Patch looks good ! I have a couple of initial comments:

* facets.alg: as I often find these .alg files as examples, I think it would be good if it
declares facet.source (to random) explicitly.

* OpenTaxonomyReaderTask: I see that since PerfRunData incRef() the incoming taxonomy, you
decRef(). I also see that setIndexReader behaves the same way. But I find it confusing. Personally,
since this is not an application, I don't think we should 'hold a reference to IR/LTR just
in case the one who set it closes it'. But if we do that, can we at least document on setIR/LTR
that this is the case? I can certainly see myself opening IR/LTR, setting on PerfRunData without
decRef()/close(). It would not occur to me that I should ...

* The abstraction of ItemSource is nice. But it's jdocs still contain content.source.*. Since
we're not committed to backwards compatibility in benchmark, and in the interest of clarity,
perhaps we should rename them to item.source.*?

* ItemSource.resetInputs has a @SuppressWarnings("unused") -- is it a leftover from when it
was private?

* In PerfRunData ctor you do a Class.forName using the String name of RandomFacetSource. Why
not use RandomFacetSource.class.getName()?

Looks very good. Now with FacetSource we can generate facets per the case we want to test
(dense hierarchies, Zipf'ian ...)
                
> Facet benchmarking
> ------------------
>
>                 Key: LUCENE-3262
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3262
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/benchmark, modules/facet
>            Reporter: Shai Erera
>            Assignee: Doron Cohen
>         Attachments: CorpusGenerator.java, LUCENE-3262.patch, TestPerformanceHack.java
>
>
> A spin off from LUCENE-3079. We should define few benchmarks for faceting scenarios,
so we can evaluate the new faceting module as well as any improvement we'd like to consider
in the future (such as cutting over to docvalues, implement FST-based caches etc.).
> Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here as a starting
point.
> We've also done some preliminary job for extending Benchmark for faceting, so I'll attach
it here as well.
> We should perhaps create a Wiki page where we clearly describe the benchmark scenarios,
then include results of 'default settings' and 'optimized settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message