lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1997) analyzed field: Store internal value instead of input one
Date Wed, 11 Aug 2010 05:02:17 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897139#action_12897139
] 

Ryan McKinley commented on SOLR-1997:
-------------------------------------

Check an old old issue SOLR-314... that did the same thing

I'm still torn if this is a good idea or not...


> analyzed field: Store internal value instead of input one
> ---------------------------------------------------------
>
>                 Key: SOLR-1997
>                 URL: https://issues.apache.org/jira/browse/SOLR-1997
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4, 1.4.1, 1.5
>            Reporter: Joan Codina
>             Fix For: 1.4, 1.4.1, 1.5
>
>         Attachments: SOLR-1997-1.4.patch, SOLR-1997-1.5.patch
>
>
> Solr implements a set of filters and tokenizers that allow the filtering and treatment
of text, but when the field is set to be stored, the text stored is the input one. This is
may useful when the end user reads the input, but may not be like this in others, cases, when
for example there are payloads and the text is something like A|2.0 good|1.0 day|3.0, or if
the result of a query is processed using something like Carrot2
> So this is a simple new kind of field that takes as input the output of a given type
(source), and then performs the normal processing with the desired tokenizers and filters
. The difference is that the stored value is the output of the source type, and this is what
is retrieved when getting the document.
> The name of the field type  is AnalyzedField and in the schema is introduced in the following
way to create the analyzedSourceType from the  SourceType
> 		<fieldType name="SourceType" class="solr.TextField"  >
> 			<analyzer type="index">
> 				<tokenizer class="solr.StandardTokenizerFactory" />
> 				<filter class......." />
> 			</analyzer>
> 			<analyzer type="query">
> 				<tokenizer class="solr.StandardTokenizerFactory" />
> 				<filter ....." />
> 			</analyzer>
> 		</fieldType>
>  <fieldType name="analyzedSoureType" class="solr.AnalyzedField" positionIncrementGap="100"
preProcessType="SourceType">
>              <analyzer>
>                  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>            </analyzer>
>  </fieldType>
> many times just the WhitespaceTokenizerFactory  is needed as the tokens have already
been cut down by the  SourceType
> finally, a field can be declared as 
> <field name="analyzedData" type="analyzedSoureType" indexed="true" stored="true" termVectors="true"
multiValued="true"/>
> which can be written directly or can be defined as a copy of the source one.
> <field name="Data" type="analyzedSoureType" indexed="true" stored="true" termVectors="true"
multiValued="true"/>
> ...
> <copyField source=data" dest="analyzedData"/>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message