lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3079) Facetiing module
Date Mon, 27 Jun 2011 13:03:47 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-3079:
-------------------------------

    Attachment: LUCENE-3079.patch

Attached patch includes the faceted search module. It currently compiles 
against 3x, so I've put it under lucene/contrib, but after we port it to 
trunk, it should be under modules/.

There isn't a short way to describe the content of the patch (as you can 
see, it's huge), so instead I'll give a brief overview of some key 
packages: 

* src/examples: contains code of different capabilities of the facets module. I'd start w/
examples/simple.
* src/test: contains many tests, great place to start too
* o.a.l.facet.taxonomy contains the taxonomy index management code. There are two interfaces
TaxonomyWriter/Reader with a LuceneTW/TR impl
* o.a.l.facet.index contains the indexing code of the different capabilities (e.g. simpl,
enhancements etc.)
* o.a.l.facet.search contains the respective search code

Few points:
* I've put the ASL on all files.
* Marked all code as @lucene.experimental.
* After you apply the patch you can run 'ant eclipse' and it will build the facets code in
eclipse (no maven integration yet - will need to look into it)
* Under o.a.l and o.a.l.util there are several utility classes that are not specific to facetted
search, however the facets code uses them. I've kept them there so that we can review and
decide whether we want to move them to lucene-core at some point.

TODOs:
* After it's on trunk, I think we should explore replacing the payloads w/ DocValues.
* Leverage Lucene's superb random testing framework !
* There are few TODOs in the code which I think can be addressed following this issue.

I will open follow-on issues for those.

Given the amount of code, I am wondering if perhaps we should commit it 
as-is, and do more thorough reviews afterwards. The code does not modify 
any existing code (aside from a tiny change to LuceneTestCase), so I 
think there's no risk in doing so. I am also not sure that it's sane to 
review that amount of code (nearly 40K lines) in patch form. What do you 
think?

> Facetiing module
> ----------------
>
>                 Key: LUCENE-3079
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3079
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-3079.patch, LUCENE-3079.patch
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message