lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3129) Single-pass grouping collector based on doc blocks
Date Wed, 25 May 2011 12:30:47 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-3129:
---------------------------------------

    Attachment: LUCENE-3129.patch


New patch attached; I think it's ready to commit.

I changed the approach, poaching an improvement from nested docs
(LUCENE-2454): instead of pulling a DocTermsIndex from the field
cache, and detecting new group by changing ord, I require the app
provides a Filter to denote the transition between groups.

Not only is this better because it uses far less RAM, it's also more
general than the 2-pass collector in that the app is free to
arbitrarily set the groups by indexing the right doc blocks.  All
that's necessary is the app has some way to create the Filter noting
the last doc in each group.  It need not be a "single valued indexed
field"...

Performance is good ~ 25-28% faster than the two-pass collector with
caching.


> Single-pass grouping collector based on doc blocks
> --------------------------------------------------
>
>                 Key: LUCENE-3129
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3129
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/grouping
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3129.patch, LUCENE-3129.patch
>
>
> LUCENE-3112 enables adding/updating a contiguous block of documents to
> the index, guaranteed (yet, experimental!) to retain adjacent docID
> assignment through the full life of the index as long the app doesn't
> delete individual docs from the block.
> When an app does this, it can enable neat features like LUCENE-2454
> (nested documents), post-group facet counting (LUCENE-3097).
> It also makes single-pass grouping possible, when you group by
> the "identifier" field shared by the doc block, since we know we will
> see a given group only once with all of its docs within one block.
> This should be faster than the fully general two-pass collectors we
> already have.
> I'm working on a patch but not quite there yet...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message