lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash
Date Fri, 02 Sep 2011 16:07:10 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-3199:
------------------------------------

    Attachment: LUCENE-3199.patch

hey jason, I actually moved this a little further and added a ReadOnly View To BytesRefHash.
This View provides next(), seekExact() and seekCeil() methods just like we have TermsEnum.

The view is actually sorted if needed and can incrementally merge with a previously created
view. 
Initially I wondered if this approach would be feasible performance wise but in fact this
 is actually really fast. I did some poor-mans benchmarks where I opened a new view every
500 to 1000 new unique terms and this takes around 0.001 to 0.01 millisecond on average. I
have never seen it taking longer than 0.1 ms. I think it would be worth while exploring if
we can go that simple and reopen such a view for each document while we are indexing. The
view actually allocates only one additional array and reuses all other references from the
BytesRefHash instance. It seems this one additional int[] is not too bad though.

the patch is still rough. I will work further on it next week. 

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively
generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message