lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problem with faceting
Date Sat, 07 Feb 2015 19:38:14 GMT
You might try sending the request to each individual shard with
&distrib=false appended to see if you somehow indexed
the same docs to each shard individually, although that doesn't
really make sense given the fact that numFound is 2.

Best,
Erick

On Fri, Feb 6, 2015 at 8:10 AM, Alvaro Cabrerizo <toporniz@gmail.com> wrote:
> Hi,
>
> Totally agree about the schema. It couldn't be the issue.
>
> On the other hand, I made a naive test. Using the schema and the data you
> attached I've tried to search over the same solr instance as if it was two
> different shards.
>
> For example, searching for:
>
> http://localhost:8081/solr/select?q=*:*&wt=json&indent=true&facet=true&facet.field=ID_bent
>
> gives the correct response (2 hits + nice facet count),
>
>
>    - "docs": [
>       - {
>          - "id": "1",
>          - "ID_bent": "#77762702P#77762953Y#77768200D#77763320M#77760725D#",
>          - "score": 2.5645533
>       },
>       - {
>          - "id": "2",
>          - "ID_bent":
>          "#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>          ,
>          - "score": 2.5645533
>       }
>    ]
>
> },"facet_counts": {
>
>    - "facet_queries": { },
>    - "facet_fields": {
>       - "ID_bent": [
>          - "77760725D",
>          - 2,
>          - "77762702P",
>          - 2,
>          - "77762953Y",
>          - 2,
>          - "77763320M",
>          - 2,
>          - "77768200D",
>          - 2,
>          - "00020080J",
>
>
>
> but when searching for:
> http://localhost:8081/solr/select?q=*:*&wt=json&indent=true&facet=true&facet.field=ID_bent&
> *shards=localhost:8081/solr,localhost:8081/solr*
>
> It gives me the next response (nice response for document found, as they
> can be "merged" using the id, but bad facet count, as the values can't be
> merged).
>
>    - "numFound": 2,
>    - "start": 0,
>    - "maxScore": 2.5645533,
>    - "docs": [
>       - {
>          - "id": "1",
>          - "ID_bent": "#77762702P#77762953Y#77768200D#77763320M#77760725D#",
>          - "score": 2.5645533
>       },
>       - {
>          - "id": "2",
>          - "ID_bent":
>          "#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>          ,
>          - "score": 2.5645533
>       }
>    ]
>
> },"facet_counts": {
>
>    - "facet_queries": { },
>    - "facet_fields": {
>       - "ID_bent": [
>          - "77760725D",
>          - 4,
>          - "77762702P",
>          - 4,
>          - "77762953Y",
>          - 4,
>          - "77763320M",
>          - 4,
>          - "77768200D",
>          - 4,
>          - "00020080J",
>
>
> So there could be two different issues:
>
>    - Duplicated documents (one on each shard)
>    - Bad request, including twice the list of shards requested (I would bet
>    the other is the real cause)
>
> Hope it helps.
>
>
>
>
>
>
>
>
> On Fri, Feb 6, 2015 at 1:27 PM, <david.davila@correo.aeat.es> wrote:
>
>> Hi Alvaro,
>>
>> this is the definition:
>>
>>                  <fieldType name="entidades" class="solr.TextField">
>>                                  <analyzer type="index">
>>                                                  <tokenizer
>> class="solr.PatternTokenizerFactory" pattern="#"/>
>>                                  </analyzer>
>>                  </fieldType
>>
>>
>> As you can see we store all the ID split with a #. Normally this have
>> worked fine, and I think that the problem has nothing to do with the
>> definition.
>> Besides, I have seen that when the correct value in the facet field would
>> be 2, Solr shows 4, and when it would be 1 it shows 2. In conclusion, for
>> some reason values are being duplicated. Why? I have no idea.  And this
>> doesn't happen always, it´s more, only with some queries or some
>> documents. It's very weird, maybe Solr Cloud is merging the results from
>> the two shards in a wrong way in some situations, but I have no idea.
>>
>> Regards,
>>
>>
>> David Dávila Atienza
>> AEAT - Departamento de Informática Tributaria
>> Subdirección de Tecnologías de Análisis de la Información e Investigación
>> del Fraude
>> Teléfono: 915828763
>> Extensión: 36763
>>
>>
>>
>> De:     Alvaro Cabrerizo <toporniz@gmail.com>
>> Para:   "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>,
>> Fecha:  06/02/2015 12:34
>> Asunto: Re: Problem with faceting
>>
>>
>>
>> Hi David,
>>
>> Yes it sounds weird.
>>
>> Just for testing purpose, It would be nice to have the ID_bent fieldtype
>> definition.
>>
>> Regards.
>>
>> On Fri, Feb 6, 2015 at 9:05 AM, <david.davila@correo.aeat.es> wrote:
>>
>> > Hello,
>> >
>> > we have been using faceting for a long time, but now I have discovered a
>> > problem that I can't understand:
>> >
>> > the issue is that in a query with 2 results, in some facet values Solr
>> is
>> > answering that there are 4 results. But faceting only applies over the
>> > result documents, therefore I think that this makes no sense.
>> >
>> > This is the query:
>> >
>> >
>> >   "responseHeader": {
>> >     "status": 0,
>> >     "QTime": 330,
>> >     "params": {
>> >       "facet": "true",
>> >       "fl": "ID_bent",
>> >       "indent": "true",
>> >       "q": "aitana",
>> >       "_": "1423207958751",
>> >       "facet.field": "ID_bent",
>> >       "wt": "json",
>> >       "fq": "ee_Procedimiento:ZZ12 AND ee_Referencia:\"CURSO\" AND
>> > doc_FormatoDocumento:PDF"
>> >     }
>> >   },
>> >   "response": {
>> >     "numFound": 2,
>> >     "start": 0,
>> >     "maxScore": 0.17735688,
>> >     "docs": [
>> >       {
>> >         "ID_bent": "#77762702P#77762953Y#77768200D#77763320M#77760725D#"
>> >       },
>> >       {
>> >         "ID_bent":
>> >
>> >
>>
>> "#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>> >       }
>> >     ]
>> >   },
>> >   "facet_counts": {
>> >     "facet_queries": {},
>> >     "facet_fields": {
>> >       "ID_bent": [
>> >         "77760725D",
>> >         4,
>> >         "77762702P",
>> >         4,
>> >         "77762953Y",
>> >         4,
>> >         "77763320M",
>> >         4,
>> >         "77768200D",
>> >         4,
>> >         "00000336Z",
>> >         2,
>> >         "00020000J",
>> >         2,
>> >         "04889446Z",
>> >         2,
>> >         "11111111H",
>> >         2,
>> >         "12312312K",
>> >         2,
>> >         "12345655Z",
>> >         2,
>> >         "48261207P",
>> >         2,
>> >         "77760302T",
>> >         2,
>> >         "77760631F",
>> >         2,
>> >         "77763453T",
>> >         2,
>> >         "77765788N",
>> >         2,
>> >
>> >
>> > We are using Solr 4.7 in cloud configuration with 2 shards.  Any idea
>> what
>> > it is happening?
>> >
>> > Thanks in advance,
>> >
>> > David Dávila Atienza
>> > AEAT - Departamento de Informática Tributaria
>> > Subdirección de Tecnologías de Análisis de la Información e
>> Investigación
>> > del Fraude
>> >
>>
>>

Mime
View raw message