lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: special TItle Sorting etc
Date Sat, 24 May 2014 14:50:09 GMT
Hi Harry,

You should be using solr.StrField, or KeywordTokenizer with solr.TextField - otherwise you’ll
get multiple tokens, and for sorting, you want just one.

Here’s one way to get what you want: copyfield your title to a sortable field with a fieldtype
something like (untested):

<fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” omitNorms=“true”>
  <analyzer>
    <charFilter class=“solr.PatternReplaceCharFilterFactory”
                pattern=“^(?i)(a|an|the)\s+” 
                replacement=“”/>
    <tokenizer class=“solr.KeywordTokenizerFactory”/>
    <filter class="solr.ICUFoldingFilterFactory"/>
  </analyzer>
</fieldType>

The “(?i)” thing at the start of the pattern will cause it to match case-insensitively.

A common strategy for sorting titles while ignoring initial articles is to place the article
at the end, separated by a comma, e.g. “Book, The” and “Wallet, A”; such a sorting
mechanism would allow you to consistently sort “Book”, “The Book”, and “A Book”
- here’s a slightly different version of the above field type that achieves this (again,
untested):

<fieldType name=“titleSort” class=“solr.TextField” sortMissingLast=“true” omitNorms=“true”>
  <analyzer>
    <charFilter class=“solr.PatternReplaceCharFilterFactory”
                pattern=“^(?i)(a|an|the)\s+(.*)” 
                replacement=“$2, $1”/>
    <tokenizer class=“solr.KeywordTokenizerFactory”/>
    <filter class="solr.ICUFoldingFilterFactory"/>
  </analyzer>
</fieldType>

Steve

On May 24, 2014, at 9:56 AM, HL <freemail.grharry@gmail.com> wrote:

> I am trying to sort by title field  asc or desc
> in a manner that is influenced by the stopwords list of a language,
> 
> for Instance I would like the title
> "The Book", and "A Wallet"  when sorted  appear as
> 
> title
> ---------
> The Book
> A Wallet
> 
> but while I only managed to get my head smashed on the solr wall,
> while I had NO SUCCESS what-so-ever !
> 
> 
> So far I've tried to do this from Solr by various  filedType definitions and either copy
the contents of title to BIB_title_sort
> or via a dynamicField  with a suffix or a prefix,
> or even import the title straight into the field.
> 
> Here is my last FAILED attempt to do that
> 
> <fieldType name="sortString" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>        <analyzer type="index">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>            <filter class="solr.ICUFoldingFilterFactory"/>
>            <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_el.txt,lang/stopwords_en.txt"
enablePositionIncrements="true"/>
>            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
> 
> My question is
> 
> Is there a possible way to do that in SOLR?
> OR
> Do I HAVE TO remove the STOP WORDS and so on, during the IMPORT process, by only writing
custom scripts??
> Thanks in advance,
> Harry
> 
> 
> 
> 


Mime
View raw message