lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Copy field a source of copy field
Date Tue, 18 Jul 2017 00:26:13 GMT
In a word, "no". Copyfields are not chained together. I'm not at all
sure what you're trying to accomplish with those filter chains anyway,
By shingling _then_ doing the stopwords, you'll have some input like
abies durangensis

become

abies
abies_durangensis
durangensis

Then put that through your keepwords filter which presumably only has
species in it so it would throw out abies and abies_durangensis unless
those are in your keepwords file.... Seems a waste.

That aside, you can construct one long analysis chain that combined
the genus and species chains and just copy from attr_content* into
both. You wouldn't get the different tokenization, but presumably you
don't particularly need it on the second part of the chain.

Best,
Erick

On Mon, Jul 17, 2017 at 3:26 PM, tstusr <ulfrheimr@gmail.com> wrote:
> Hi
>
> We want to use a copy field as a source for another copy field or some kind
> of post processing of a field.
>
> The problem is here. We have a field from a text that is captured by a
> field, like this:
>
> <copyField source="attr_content*" dest="species"/>
>
> which has (at the end of the processing) just the words in a field.
>
> <field name="species" type="species_type" stored="true" indexed="true"
> termVectors="true" termPositions="true" termOffsets="true"/>
>
> <fieldType name="species_type" class="solr.TextField"
> positionIncrementGap="0">
>     <analyzer type="index">
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping/mapping-ISOLatin1Accent.txt"/>
>       <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[0-9]+|(\-)(\s*)" replacement=""/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> outputUnigrams="true"/>
>       <filter class="solr.KeepWordFilterFactory" words="species.txt"
> ignoreCase="true"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> outputUnigrams="false"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> So, what we want to do now is to implement a faceting according to some post
> processing of this field by using this as a source for another field.
>
> <copyField source="species" dest="genus"/>
>
> <fieldType name="genus_type" class="solr.TextField"
> positionIncrementGap="0">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.KeepWordFilterFactory" words="genus.txt"
> ignoreCase="true"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> As soon as I understand. We don't have a value on genus because the chain is
> ended. Nevertheless, we are also not available to make two processings to
> first, capture the words on species and then make a new capture for the
> genus.
>
> As an example imagine we have on species
>
> abies durangensis
> abies flinckii
>
> so, after post processing, we expect to have only
> abies
>
> which is a word in genus files
>
> I was as clear as possible with the problem, but maybe there are some black
> holes in the explanation.
>
> Hope you can help me.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message