lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From david.dav...@correo.aeat.es
Subject Re: Problem with faceting
Date Mon, 09 Feb 2015 11:50:07 GMT
Hi,

that was the problem. I don't know why, but we have some documents 
duplicated in the two shards, maybe we have had our config file wrong some 
time.

Thank very much,


David Dávila Atienza
AEAT



De:     Erick Erickson <erickerickson@gmail.com>
Para:   solr-user@lucene.apache.org, 
Fecha:  08/02/2015 11:21
Asunto: Re: Problem with faceting



You might try sending the request to each individual shard with
&distrib=false appended to see if you somehow indexed
the same docs to each shard individually, although that doesn't
really make sense given the fact that numFound is 2.

Best,
Erick

On Fri, Feb 6, 2015 at 8:10 AM, Alvaro Cabrerizo <toporniz@gmail.com> 
wrote:
> Hi,
>
> Totally agree about the schema. It couldn't be the issue.
>
> On the other hand, I made a naive test. Using the schema and the data 
you
> attached I've tried to search over the same solr instance as if it was 
two
> different shards.
>
> For example, searching for:
>
> 
http://localhost:8081/solr/select?q=*:*&wt=json&indent=true&facet=true&facet.field=ID_bent

>
> gives the correct response (2 hits + nice facet count),
>
>
>    - "docs": [
>       - {
>          - "id": "1",
>          - "ID_bent": 
"#77762702P#77762953Y#77768200D#77763320M#77760725D#",
>          - "score": 2.5645533
>       },
>       - {
>          - "id": "2",
>          - "ID_bent":
> 
"#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>          ,
>          - "score": 2.5645533
>       }
>    ]
>
> },"facet_counts": {
>
>    - "facet_queries": { },
>    - "facet_fields": {
>       - "ID_bent": [
>          - "77760725D",
>          - 2,
>          - "77762702P",
>          - 2,
>          - "77762953Y",
>          - 2,
>          - "77763320M",
>          - 2,
>          - "77768200D",
>          - 2,
>          - "00020080J",
>
>
>
> but when searching for:
> 
http://localhost:8081/solr/select?q=*:*&wt=json&indent=true&facet=true&facet.field=ID_bent&

> *shards=localhost:8081/solr,localhost:8081/solr*
>
> It gives me the next response (nice response for document found, as they
> can be "merged" using the id, but bad facet count, as the values can't 
be
> merged).
>
>    - "numFound": 2,
>    - "start": 0,
>    - "maxScore": 2.5645533,
>    - "docs": [
>       - {
>          - "id": "1",
>          - "ID_bent": 
"#77762702P#77762953Y#77768200D#77763320M#77760725D#",
>          - "score": 2.5645533
>       },
>       - {
>          - "id": "2",
>          - "ID_bent":
> 
"#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>          ,
>          - "score": 2.5645533
>       }
>    ]
>
> },"facet_counts": {
>
>    - "facet_queries": { },
>    - "facet_fields": {
>       - "ID_bent": [
>          - "77760725D",
>          - 4,
>          - "77762702P",
>          - 4,
>          - "77762953Y",
>          - 4,
>          - "77763320M",
>          - 4,
>          - "77768200D",
>          - 4,
>          - "00020080J",
>
>
> So there could be two different issues:
>
>    - Duplicated documents (one on each shard)
>    - Bad request, including twice the list of shards requested (I would 
bet
>    the other is the real cause)
>
> Hope it helps.
>
>
>
>
>
>
>
>
> On Fri, Feb 6, 2015 at 1:27 PM, <david.davila@correo.aeat.es> wrote:
>
>> Hi Alvaro,
>>
>> this is the definition:
>>
>>                  <fieldType name="entidades" class="solr.TextField">
>>                                  <analyzer type="index">
>>                                                  <tokenizer
>> class="solr.PatternTokenizerFactory" pattern="#"/>
>>                                  </analyzer>
>>                  </fieldType
>>
>>
>> As you can see we store all the ID split with a #. Normally this have
>> worked fine, and I think that the problem has nothing to do with the
>> definition.
>> Besides, I have seen that when the correct value in the facet field 
would
>> be 2, Solr shows 4, and when it would be 1 it shows 2. In conclusion, 
for
>> some reason values are being duplicated. Why? I have no idea.  And this
>> doesn't happen always, it´s more, only with some queries or some
>> documents. It's very weird, maybe Solr Cloud is merging the results 
from
>> the two shards in a wrong way in some situations, but I have no idea.
>>
>> Regards,
>>
>>
>> David Dávila Atienza
>> AEAT - Departamento de Informática Tributaria
>> Subdirección de Tecnologías de Análisis de la Información e 
Investigación
>> del Fraude
>> Teléfono: 915828763
>> Extensión: 36763
>>
>>
>>
>> De:     Alvaro Cabrerizo <toporniz@gmail.com>
>> Para:   "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>,
>> Fecha:  06/02/2015 12:34
>> Asunto: Re: Problem with faceting
>>
>>
>>
>> Hi David,
>>
>> Yes it sounds weird.
>>
>> Just for testing purpose, It would be nice to have the ID_bent 
fieldtype
>> definition.
>>
>> Regards.
>>
>> On Fri, Feb 6, 2015 at 9:05 AM, <david.davila@correo.aeat.es> wrote:
>>
>> > Hello,
>> >
>> > we have been using faceting for a long time, but now I have 
discovered a
>> > problem that I can't understand:
>> >
>> > the issue is that in a query with 2 results, in some facet values 
Solr
>> is
>> > answering that there are 4 results. But faceting only applies over 
the
>> > result documents, therefore I think that this makes no sense.
>> >
>> > This is the query:
>> >
>> >
>> >   "responseHeader": {
>> >     "status": 0,
>> >     "QTime": 330,
>> >     "params": {
>> >       "facet": "true",
>> >       "fl": "ID_bent",
>> >       "indent": "true",
>> >       "q": "aitana",
>> >       "_": "1423207958751",
>> >       "facet.field": "ID_bent",
>> >       "wt": "json",
>> >       "fq": "ee_Procedimiento:ZZ12 AND ee_Referencia:\"CURSO\" AND
>> > doc_FormatoDocumento:PDF"
>> >     }
>> >   },
>> >   "response": {
>> >     "numFound": 2,
>> >     "start": 0,
>> >     "maxScore": 0.17735688,
>> >     "docs": [
>> >       {
>> >         "ID_bent": 
"#77762702P#77762953Y#77768200D#77763320M#77760725D#"
>> >       },
>> >       {
>> >         "ID_bent":
>> >
>> >
>>
>> 
"#77760631F#77766156N#77760725D#77762702P#77765788N#48991207P#77762953Y#77760302T#12312312K#89890001K#77768200D#89890003T#11111111H#77763453T#99999999R#00020080J#Y4332393N#04889446Z#12345655Z#77763320M#11100336Z#Y4222970X#"
>> >       }
>> >     ]
>> >   },
>> >   "facet_counts": {
>> >     "facet_queries": {},
>> >     "facet_fields": {
>> >       "ID_bent": [
>> >         "77760725D",
>> >         4,
>> >         "77762702P",
>> >         4,
>> >         "77762953Y",
>> >         4,
>> >         "77763320M",
>> >         4,
>> >         "77768200D",
>> >         4,
>> >         "00000336Z",
>> >         2,
>> >         "00020000J",
>> >         2,
>> >         "04889446Z",
>> >         2,
>> >         "11111111H",
>> >         2,
>> >         "12312312K",
>> >         2,
>> >         "12345655Z",
>> >         2,
>> >         "48261207P",
>> >         2,
>> >         "77760302T",
>> >         2,
>> >         "77760631F",
>> >         2,
>> >         "77763453T",
>> >         2,
>> >         "77765788N",
>> >         2,
>> >
>> >
>> > We are using Solr 4.7 in cloud configuration with 2 shards.  Any idea
>> what
>> > it is happening?
>> >
>> > Thanks in advance,
>> >
>> > David Dávila Atienza
>> > AEAT - Departamento de Informática Tributaria
>> > Subdirección de Tecnologías de Análisis de la Información e
>> Investigación
>> > del Fraude
>> >
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message