lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
Date Sat, 23 Oct 2010 11:30:19 GMT


Uwe Schindler updated LUCENE-2719:

    Attachment: LUCENE-2719.patch

Modified all sorts to use >>>1 instead /2 (have no real opinion about that).

bq. Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting
indexes (an int[]) instead of objects (parallel array sorting).

According to java.util.Arrays javadoc: The sorting algorithm is a tuned quicksort, adapted
from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice
and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance
on many data sets that cause other quicksorts to degrade to quadratic performance. 

If I change to SorterTemplate, we will degrade to good old quicksort. We could also upgrade
SorterTemplate to use this algo, but I am not sure if thats easy because SorterTemplate only
allows swap(index, index) and compare(index, index). But we cannot retrieve e.g. the pivot
value. This was also one problem in porting BytesRefHash's quicksort, as it used the value
of the pivot (this is why there is an additional check below the commented out assert in the
current patch's algo).

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash
> ----------------------------------------------------------------------------------------------
>                 Key: LUCENE-2719
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed
after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is
slow, because it clones internally to ensure stable search. This component is much faster.
This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using
a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short
ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback
in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm
needs to be separated from the underlying data and you can implement a swap() and compare()
function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays,
it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied
modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB
SorterTemplate, which is no longer maintained.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message