lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Sebastien Vachon <jean-sebastien.vac...@wantedanalytics.com>
Subject RE: Were changes made to facetting on multivalued fields recently?
Date Fri, 11 Apr 2014 14:52:48 GMT
Thanks to both of you. I finally found the issue and you were right (again) ;)

The problem was not coming from the full indexation code containing the SQL replace statement
but from another process whose job is to maintain our index up to date. This process had no
idea that commas were to be replaced by spaces for some fields (and it should not about this
either).

I changed the Tokenizer used for the field to the following and everything is fine now.
    <tokenizer class="solr.PatternTokenizerFactory" pattern=","/>

Thanks for your help

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: April-10-14 1:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> bq: The SQL query contains a Replace statement that does this
> 
> Well, I suspect that's where the issue is. The facet values being reported
> include:
> <int name="4,1">134826</int>
> which indicates that the incoming text to Solr still has the commas.
> Solr is seeing the commas and all.
> 
> You can cure this by using PatternReplaceCharFilterFactory and doing the
> substitution at index time if you want to.
> 
> That doesn't clarify why the behavior has changed though, but my
> supposition is that it has nothing to do with Solr, and something about your
> SQL statement is different.
> 
> Best,
> Erick
> 
> On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon <jean-
> sebastien.vachon@wantedanalytics.com> wrote:
> > The SQL query contains a Replace statement that does this
> >
> >> -----Original Message-----
> >> From: Shawn Heisey [mailto:solr@elyograg.org]
> >> Sent: April-10-14 11:30 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Were changes made to facetting on multivalued fields
> recently?
> >>
> >> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> >> > Here are the field definitions for both our old and new index... as
> >> > you can
> >> see that are identical. We've been using this chain and field type
> >> starting with Solr 1.4 and never had any problem. As for the
> >> documents, both indexes are using the same data source. They could be
> >> slightly out of sync from time to time but we tend to index them on a
> >> daily basis. Both indexes are also using the same code (indexing through
> SolrJ) to index their content.
> >> >
> >> > The source is a column in MySql that contains entries such as "4,1"
> >> > that get stored in a Multivalued fields after replacing commas by
> >> > spaces
> >> >
> >> > OLD (4.6.1):
> >> >    <fieldType name="text_ws" class="solr.TextField"
> >> positionIncrementGap="100">
> >> >       <analyzer>
> >> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> >     <field name="ad_job_type_id" type="text_ws" indexed="true"
> >> > stored="true" required="false" multiValued="true" />
> >>
> >> Just so you know, there's nothing here that would require the field
> >> to be multivalued.  WhitespaceTokenizerFactory does not create
> >> multiple field values, it creates multiple terms.  If you are
> >> actually inserting multiple values for the field in SolrJ, then you would
> need a multivalued field.
> >>
> >> What is replacing the commas with spaces?  I don't see anything here
> >> that would do that.  It sounds like that part of your indexing is not
> working.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >> -----
> >> Aucun virus trouvé dans ce message.
> >> Analyse effectuée par AVG - www.avg.fr
> >> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> >> 09/04/2014
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> 09/04/2014

Mime
View raw message