lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: Distinct values in multivalued fields
Date Mon, 01 Jul 2013 13:57:23 GMT
Have a look at the DedupUpdateProcessorFactory, which may help you.
Although, I'm not sure if it works with multivalued fields.

Upayavira

On Mon, Jul 1, 2013, at 02:34 PM, tuedel wrote:
> Hello everybody,
> 
> i have tried to make use of the UniqFieldsUpdateProcessorFactory in 
> order to achieve distinct values in multivalued fields. Example below: 
> 
> <updateRequestProcessorChain name="uniq_fields"> 
>    <processor 
> class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory"> 
>      <lst name="fields"> 
>        <str>title</str> 
>        <str>tag_type</str> 
>      </lst> 
>    </processor> 
>    <processor class="solr.RunUpdateProcessorFactory" /> 
> </updateRequestProcessorChain> 
> 
> <requestHandler name="/update" class="solr.UpdateRequestHandler"> 
>    <lst name="defaults"> 
>       <str name="update.chain">uniq_fields</str> 
>     </lst> 
>   </requestHandler> 
> 
> However the data being is indexed one by one. This may happen, since a 
> document may will get an additional tag in a future update. Unfortunately
> in 
> order to ensure not having any duplicate tags, i was hoping, the 
> UpdateProcessorFactory is doing what i want to achieve. In order to
> actually 
> add a tag, i am sending an 
> 
> "tag_type" :{"add":"foo"}, which still adds the tag, without questioning
> if 
> its already part of the field. How may i be able to achieve distinct
> values 
> on solr side?! 
> 
> In order to achieve this behavior i suggest writing an own processor
> might
> be a solution. However i am uncertain how to do and if it's the proper
> way. 
> Imagine an incoming update - e.g. an update of an existing document
> having
> several multivalued fields without specifying "add" or "set". This task
> would cause the corresponding document to get dropped and re-indexed
> without
> keeping any previously added values within the multivalued field. 
> Therefore if a field is getting updated and not having the distinct value
> being part of the index yet, shall add the value, otherwise ignore it.
> The
> processor needs to define whether a field is getting added to the index
> or
> not in condition of the existing index. Is that achievable on Solr side?! 
> Below my current pretty empty processor class:
> 
> public class ConditionalSolrUniqFieldValuesProcessorFactory extends
> UpdateRequestProcessorFactory {
> 
>     @Override
>     public UpdateRequestProcessor getInstance(SolrQueryRequest sqr,
> SolrQueryResponse sqr1, UpdateRequestProcessor urp) {
>         return new ConditionalUniqFieldValuesProcessor(urp);
>     }
> 
>     class ConditionalUniqFieldValuesProcessor extends
>     UpdateRequestProcessor
> {
> 
>         public ConditionalUniqFieldValuesProcessor(UpdateRequestProcessor
> next) {
>             super(next);
>         }
> 
>         @Override
>         public void processAdd(AddUpdateCommand cmd) throws IOException {
>             SolrInputDocument doc = cmd.getSolrInputDocument();
> 
>             Collection<String> incomingFieldNames = doc.getFieldNames();
>             for (String t : incomingFieldNames) {
>                 /*
>                 is multivalued
>                 if (doc.getField(t).) { 
>                     If multivalued and already part of index, drop from
> index. Otherwise add to multivalued field.
>                 }
>                 */
>             }
>          
>         }
>     }
> }
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Distinct-values-in-multivalued-fields-tp4074337.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message