lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: New Lucene features and Solr indexes
Date Sat, 16 Feb 2013 19:58:54 GMT
It seems as if you are using the text field analyzer to "clean up" or 
"normalize" the values for that field, but generally an analyzer is mapping 
from source terms to index terms, with the expectation that the index 
term(s) may be radically different from the source terms, and generally, 
tokenizing the input stream as well.

Maybe this is simply a question of best practices for using analyzers for 
"analysis" as opposed to the cleanup/normalization that an update processor 
would normally do. In other words, situations where the analyzer is used as 
a poor man's update processor for what otherwise would/should be simple 
string fields.

-- Jack Krupansky

-----Original Message----- 
From: Shawn Heisey
Sent: Saturday, February 16, 2013 11:43 AM
To: dev@lucene.apache.org
Subject: Re: New Lucene features and Solr indexes

2/14/2013 8:26 AM, Adrien Grand wrote:
>> This suggests that adding docvalues to the uniqueKey field would be a 
>> good
>> idea for distributed searching in general, since the first phase of a
>> distributed search only retrieves that field and score.  That assumes of
>> course that the docvalues are fully utilized for retrieving fields during
>> that initial phase.
>
> Right, this would likely improve performance given than doc values
> (even if disk-based) are more likely to be in memory than stored
> fields. Another (better?) approach would be to use the internal Lucene
> doc IDs for distributed search (I assumed there was an open JIRA issue
> to do that but I can't find it).

Related to this ... I have been watching SOLR-3855.  I notice that
TextField is not listed on the supported types.  Is that likely to
change in the future, or is there a fundamental issue there?

My uniqueKey field uses the following fieldType definition:

     <!-- lowercases the entire field value -->
     <fieldType name="lowercase" class="solr.TextField"
sortMissingLast="true" positionIncrementGap="0" omitNorms="true">
       <analyzer>
         <tokenizer class="solr.KeywordTokenizerFactory"/>
         <filter class="solr.ICUFoldingFilterFactory"/>
         <filter class="solr.TrimFilterFactory"/>
       </analyzer>
     </fieldType>

I'm about 95% sure that the source value from MySQL will never contain
lowercase characters and probably does not actually need to be trimmed,
but we want to be able to search when an uppercase value is entered.
Would I have to give up that capability to get docvalues on this field?
  Does the current SOLR-3855 patch take advantage of docvalues for the
first phase of a distributed search when they are present, as we
discussed earlier?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message