lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Alphanumeric Field Comparison : Lucene 4.5
Date Wed, 27 Nov 2013 13:54:51 GMT
P.S. Meant to add that it's almost always a win,
if you have the space, to do any pre-computing you
can at index time, you only have to pay the cost
once.

Or you could normalize the field, simply add spaces, i.e.
left-pad each number with the appropriate number of spaces
so all digit sequences are the same width, so
Bay  1
Bay 10

Or even left-pad the digits with 0, the user's wouldn't see
that unless they used terms component or something.




On Wed, Nov 27, 2013 at 8:51 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> If this is your complete pattern, can you index two
> different fields, one text and one numeric, say
> name_text_sort that holds Bay
> name_int_sort (make sure it's a number field!)
> that holds the 1, 2, 11, etc.
>
> Then just do a primary sort on name_text_sort and
> secondary sort on name_int_sort?
>
> FWIW,
> Erick
>
>
> On Tue, Nov 26, 2013 at 4:11 AM, Umashanker, Srividhya <
> srividhya.umashanker@hp.com> wrote:
>
>>
>> >>> What are you intending to do?
>>
>> [VIDHYA]  a field with following values should be sorted in "Natural
>> Order"
>>
>> Name field has           Bay 1, Bay10, Bay 11, bay 2, Bay 3
>> should be sorted as    Bay 1, bay 2, Bay 3, Bay10, Bay 11
>>
>>
>> -----Original Message-----
>> From: Umashanker, Srividhya
>> Sent: Tuesday, November 26, 2013 2:26 PM
>> To: Uwe Schindler
>> Cc: java-user@lucene.apache.org
>> Subject: RE: Alphanumeric Field Comparison : Lucene 4.5
>>
>> We do have a duplicate field for every indexed field.
>>
>> 1> field stores  text with exact case (used for case sensitive search)
>>  2>lowercased text (used for case insensitive search)
>>
>> Let me find some example for the collator analyzer.
>>
>>
>>
>> Here is the .java lost attachment
>>
>>
>>
>> package index.search.util;
>>
>>
>>
>> import java.io.IOException;
>>
>> import java.io.Serializable;
>>
>> import java.util.Comparator;
>>
>>
>>
>> import org.apache.lucene.index.AtomicReaderContext;
>>
>> import org.apache.lucene.index.BinaryDocValues;
>>
>> import org.apache.lucene.search.FieldCache;
>>
>> import org.apache.lucene.search.FieldComparator;
>>
>> import org.apache.lucene.search.FieldComparatorSource;
>>
>> import org.apache.lucene.util.Bits;
>>
>> import org.apache.lucene.util.BytesRef;
>>
>>
>>
>> /**
>>
>> * This class is a FieldComparatorSource having an alpha numeric string
>>
>> * comparator. The comparator class will compare fields by alpha numeric
>> values
>>
>> * and gives the result to SortField which will handle the ascending and
>>
>> * descending order.
>>
>> */
>>
>> public class AlphaNumericFieldComparatorSource extends
>> FieldComparatorSource implements Serializable
>>
>> {
>>
>>
>>
>>     private static final long serialVersionUID = 1L;
>>
>>
>>
>>     /**
>>
>>      * Comparator class to compare the alpha-numeric field values.
>>
>>      */
>>
>>     private static class AlphaNumericFieldComparator extends
>> FieldComparator<String> implements Comparator<String>
>>
>>     {
>>
>>         private final String[] values;
>>
>>         private BinaryDocValues docTerms;
>>
>>         private final String field;
>>
>>         private String bottom;
>>
>>         private final BytesRef tempBR = new BytesRef();
>>
>>         private Bits docsWithField;
>>
>>         private int charIndex1, charIndex2, strLen1, strLen2;
>>
>>         // just used internally in this comparator
>>
>>         private static final byte[] MISSING_BYTES = new byte[0];
>>
>>
>>
>>         AlphaNumericFieldComparator()
>>
>>         {
>>
>>             values = null;
>>
>>             field = null;
>>
>>         }
>>
>>
>>
>>         AlphaNumericFieldComparator(final int numHits, final String field)
>>
>>         {
>>
>>             values = new String[numHits];
>>
>>             this.field = field;
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public int compareBottom(final int doc)
>>
>>         {
>>
>>             docTerms.get(doc, tempBR);
>>
>>             if (tempBR.length == 0 && docsWithField.get(doc) == false)
>>
>>             {
>>
>>                 tempBR.bytes = MISSING_BYTES;
>>
>>             }
>>
>>             if (bottom.getBytes() == MISSING_BYTES)
>>
>>             {
>>
>>                 if (tempBR.bytes == MISSING_BYTES)
>>
>>                 {
>>
>>                     return 0;
>>
>>                 }
>>
>>                 return -1;
>>
>>             }
>>
>>             else if (tempBR.bytes == MISSING_BYTES)
>>
>>             {
>>
>>                 return 1;
>>
>>             }
>>
>>             return compare(bottom, tempBR.utf8ToString());
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public void copy(final int slot, final int doc)
>>
>>         {
>>
>>             BytesRef ref = new BytesRef();
>>
>>             if (values[slot] != null && values[slot].length() > 0)
>>
>>             {
>>
>>                 ref = new BytesRef(values[slot].getBytes());
>>
>>             }
>>
>>             docTerms.get(doc, ref);
>>
>>             if (ref.length == 0 && docsWithField.get(doc) == false)
>>
>>             {
>>
>>                 values[slot] = "";
>>
>>             }
>>
>>             else
>>
>>             {
>>
>>                 values[slot] = ref.utf8ToString();
>>
>>             }
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public FieldComparator<String> setNextReader(final
>> AtomicReaderContext context)
>>
>>                 throws IOException
>>
>>         {
>>
>>             docTerms = FieldCache.DEFAULT.getTerms(context.reader(),
>> field, true);
>>
>>             docsWithField =
>> FieldCache.DEFAULT.getDocsWithField(context.reader(), field);
>>
>>             return this;
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public void setBottom(final int bottom)
>>
>>         {
>>
>>             this.bottom = values[bottom];
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public String value(final int slot)
>>
>>         {
>>
>>             return values[slot];
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public int compareValues(final String val1, final String val2)
>>
>>         {
>>
>>             if (val1 == null)
>>
>>             {
>>
>>                 if (val2 == null)
>>
>>                 {
>>
>>                     return 0;
>>
>>                 }
>>
>>                 return -1;
>>
>>             }
>>
>>            else if (val2 == null)
>>
>>             {
>>
>>                 return 1;
>>
>>             }
>>
>>             return compare(val1, val2);
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public int compareDocToValue(final int doc, final String value)
>>
>>         {
>>
>>             docTerms.get(doc, tempBR);
>>
>>             if (tempBR.length == 0 && docsWithField.get(doc) == false)
>>
>>             {
>>
>>                 tempBR.bytes = MISSING_BYTES;
>>
>>             }
>>
>>             return compare(tempBR.utf8ToString(), value);
>>
>>         }
>>
>>
>>
>>         /**
>>
>>          * Method to compare 2 alpha numeric strings
>>
>>          *
>>
>>          * @param s1
>>
>>          * @param s2
>>
>>          * @return
>>
>>          */
>>
>>         @Override
>>
>>         public int compare(final String string1, final String string2)
>>
>>         {
>>
>>             final String strVal1 = string1;
>>
>>             final String strVal2 = string2;
>>
>>             strLen1 = strVal1.length();
>>
>>             strLen2 = strVal2.length();
>>
>>             charIndex1 = charIndex2 = 0;
>>
>>
>>
>>             if (strLen1 == 0)
>>
>>             {
>>
>>                 return strLen2 == 0 ? 0 : -1;
>>
>>             }
>>
>>             else if (strLen2 == 0)
>>
>>             {
>>
>>                 return 1;
>>
>>             }
>>
>>
>>
>>             while (charIndex1 < strLen1 && charIndex2 < strLen2)
>>
>>             {
>>
>>                 final char char1 = strVal1.charAt(charIndex1);
>>
>>                 final char char2 = strVal2.charAt(charIndex2);
>>
>>                 int result = 0;
>>
>>
>>
>>                 if (Character.isDigit(char1))
>>
>>                 {
>>
>>                     result = Character.isDigit(char2) ?
>> compareDigits(strVal1, strVal2) : -1;
>>
>>                 }
>>
>>                 else if (Character.isLetter(char1))
>>
>>                 {
>>
>>                     result = Character.isLetter(char2) ?
>> compareAlphabetsAndOthers(strVal1, strVal2) : 1;
>>
>>                 }
>>
>>                 else
>>
>>                 {
>>
>>                     result = Character.isDigit(char2) ? 1 :
>> Character.isLetter(char2) ? -1 :
>>
>>                             compareAlphabetsAndOthers(strVal1, strVal2);
>>
>>                 }
>>
>>
>>
>>                 if (result != 0)
>>
>>                 {
>>
>>                     return result;
>>
>>                 }
>>
>>             }
>>
>>             return strLen1 - strLen2;
>>
>>         }
>>
>>
>>
>>         /**
>>
>>          * Method to compare only digits
>>
>>          *
>>
>>          * @return
>>
>>          */
>>
>>         private int compareDigits(final String string1, final String
>> string2)
>>
>>         {
>>
>>             int diff = 0;
>>
>>             int zeroCount1 = 0, zeroCount2 = 0;
>>
>>             char char1 = (char) 0, char2 = (char) 0;
>>
>>
>>
>>             // Count the leading zeros and compare it later.
>>
>>             while (charIndex1 < strLen1 && (char1 =
>> string1.charAt(charIndex1++)) == '0')
>>
>>             {
>>
>>                 zeroCount1++;
>>
>>             }
>>
>>             while (charIndex2 < strLen2 && (char2 =
>> string2.charAt(charIndex2++)) == '0')
>>
>>             {
>>
>>                 zeroCount2++;
>>
>>             }
>>
>>
>>
>>             while (true)
>>
>>             {
>>
>>                 final boolean endOfDigits1 = (char1 == 0) ||
>> !Character.isDigit(char1);
>>
>>                 final boolean endOfDigits2 = (char2 == 0) ||
>> !Character.isDigit(char2);
>>
>>
>>
>>                 /*
>>
>>                  * If one sequence contains more significant digits than
>> the
>>
>>                  * other, it's a larger number. In case the sequesnces
>> have
>>
>>                  * equal lengths, we need to compare digits at each
>> position;
>>
>>                  * the first
>>
>>                  * unequal pair determines which is the bigger number.
>>
>>                  */
>>
>>
>>
>>                 if (endOfDigits1 && endOfDigits2)
>>
>>                 {
>>
>>                     return diff != 0 ? diff : -(zeroCount1 - zeroCount2);
>>
>>                 }
>>
>>                 else if (endOfDigits1)
>>
>>                 {
>>
>>                     return -1;
>>
>>                 }
>>
>>                 else if (endOfDigits2)
>>
>>                 {
>>
>>                     return 1;
>>
>>                 }
>>
>>                 else if (diff == 0 && char1 != char2)
>>
>>                 {
>>
>>                     diff = char1 - char2;
>>
>>                 }
>>
>>
>>
>>                 char1 = charIndex1 < strLen1 ?
>> string1.charAt(charIndex1++) : (char) 0;
>>
>>                 char2 = charIndex2 < strLen2 ?
>> string2.charAt(charIndex2++) : (char) 0;
>>
>>             }
>>
>>         }
>>
>>
>>
>>         /**
>>
>>          * Method to compare letters and special characters
>>
>>          *
>>
>>          * @param isLetters
>>
>>          * @return
>>
>>          */
>>
>>         private int compareAlphabetsAndOthers(final String string1, final
>> String string2)
>>
>>         {
>>
>>             final char char1 = string1.charAt(charIndex1++);
>>
>>             final char char2 = string2.charAt(charIndex2++);
>>
>>
>>
>>             return (char1 == char2) ? 0 : (char1 - char2);
>>
>>         }
>>
>>
>>
>>         @Override
>>
>>         public int compare(final int slot1, final int slot2)
>>
>>         {
>>
>>             final String val1 = values[slot1];
>>
>>             final String val2 = values[slot2];
>>
>>             if (val1 == null)
>>
>>             {
>>
>>                 if (val2 == null)
>>
>>                 {
>>
>>                     return 0;
>>
>>                 }
>>
>>                 return -1;
>>
>>             }
>>
>>             else if (val2 == null)
>>
>>             {
>>
>>                 return 1;
>>
>>             }
>>
>>
>>
>>             return compare(val1, val2);
>>
>>         }
>>
>>     }
>>
>>
>>
>>     /*
>>
>>      * @see
>>
>>      *
>> org.apache.lucene.search.FieldComparatorSource#newComparator(java.lang
>>
>>      * .String, int, int, boolean)
>>
>>      */
>>
>>     @Override
>>
>>     public FieldComparator<String> newComparator(final String fieldname,
>> final int numHits, final int sortPos, final boolean reversed)
>>
>>             throws IOException
>>
>>     {
>>
>>         return new AlphaNumericFieldComparator(numHits, fieldname);
>>
>>     }
>>
>>
>>
>>     /**
>>
>>      * Method to return alpha-numeric comparator for collection sort
>>
>>      *
>>
>>      * @return comparator
>>
>>      */
>>
>>     public Comparator<String> getAlphaNumericComparator()
>>
>>     {
>>
>>         return new AlphaNumericFieldComparator();
>>
>>     }
>>
>> }
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Tuesday, November 12, 2013 1:57 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Alphanumeric Field Comparison : Lucene 4.5
>>
>>
>>
>> Hi,
>>
>>
>>
>>
>>
>> What are you intending to do? The example code is lost!
>>
>>
>>
>> In general, to sort alphanumeric/lexical on a human readable field, you
>> would use the collation functionalities (needs a separate field for sorting
>> containing the collation keys) provided by Lucene.
>>
>>
>>
>> Use
>> http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/collation/CollationKeyAnalyzer.htmlto
index the field and then you can do a simple native sort on this field
>> (SortField.STRING).
>>
>>
>>
>>
>>
>> Uwe
>>
>>
>>
>>
>>
>> -----
>>
>>
>>
>> Uwe Schindler
>>
>>
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>>
>>
>> <http://www.thetaphi.de/> http://www.thetaphi.de
>>
>>
>>
>> eMail: uwe@thetaphi.de<mailto:uwe@thetaphi.de>
>>
>>
>>
>>
>>
>> From: Umashanker, Srividhya [mailto:srividhya.umashanker@hp.com]
>>
>> Sent: Tuesday, November 12, 2013 5:00 AM
>>
>> To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
>>
>> Subject: Alphanumeric Field Comparison : Lucene 4.5
>>
>>
>>
>>
>>
>> Group –
>>
>>
>>
>>
>>
>> We are looking at sorting lucene doc’s based on a field in alphanumeric
>> order, as we expect fields to have Alpha numeric characters.
>>
>>
>>
>> Attached is the AlphaNumericFieldComparatorSource and below is the
>> snippet of its usage.
>>
>>
>>
>>
>>
>> final SortField sortField_id = new SortField(FieldName._id.name(), new
>> AlphaNumericFieldComparatorSource(), false);
>>
>>
>>
>>
>>
>> Is anyone using an easier approach or please share other alternatives
>> that you have tried.
>>
>>
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> -Vidhya
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message