lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Applying Tokenizers and Filters to CopyFields
Date Thu, 26 Mar 2015 16:48:38 GMT
Glad it worked out...

Looking back, I can't believe I didn't mention adding &debug=query to
the URL. That would have shown you exactly what the parsed query
looked like and you'd have seen right off that it wasn't searching
against the field you thought it was. It's one of the first things I
do when queries don't return what I expect.

Glad it's working for you!
Erick

On Thu, Mar 26, 2015 at 8:24 AM, Michael Della Bitta
<michael.della.bitta@appinions.com> wrote:
> Glad you are sorted out!
>
> Michael Della Bitta
>
> Senior Software Engineer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>
>
> On Thu, Mar 26, 2015 at 10:09 AM, Martin Wunderlich <martin_wu@gmx.net>
> wrote:
>
>> Thanks so much, Erick and Michael, for all the additional explanation.
>> The crucial information in the end turned out to be the one about the
>> Default Search Field („df“). In solrconfig.xml this parameter was to point
>> to the original text, which is why the expanded queries didn’t work. When I
>> set the df parameter to one of the fields with the expanded text, the
>> search works fine. I have also removed the copyField declarations.
>>
>> It’s all working as expected now. Thanks again for the help.
>>
>> Cheers,
>>
>> Martin
>>
>>
>>
>>
>> > Am 25.03.2015 um 23:43 schrieb Erick Erickson <erickerickson@gmail.com>:
>> >
>> > Martin:
>> > Perhaps this would help
>> >
>> > indexed=true, stored=true
>> > field can be searched. The raw input (not analyzed in any way) can be
>> > shown to the user in the results list.
>> >
>> > indexed=true, stored=false
>> > field can be searched. However, the field can't be returned in the
>> > results list with the document.
>> >
>> > indexed=false, stored=true
>> > The field cannot be searched, but the contents can be returned in the
>> > results list with the document. There are some use-cases where this is
>> > desirable behavior.
>> >
>> > indexed=false, stored=false
>> > The entire field is thrown out, it's just as if you didn't send the
>> > field to be indexed at all.
>> >
>> > And one other thing, the copyField gets the _raw_ data not the
>> > analyzed data. Let's say you have two fields, "src" and "dst".
>> > copying from src to dest in schema.xml is identical to
>> > <add>
>> >  <doc>
>> >    <field name=src>original text</field>
>> >   <field name=dst>original text</field>
>> > </doc>
>> > </add>
>> >
>> > that is, copyfield directives are not chained.
>> >
>> > Also, watch out for your query syntax. Michael's comments are spot-on,
>> > I'd just add this:
>> >
>> >
>> http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true
>> >
>> > is kind of odd. Let's assume you mean "qf" rather than "fq". That
>> > _only_ matters if your query parser is "edismax", it'll be ignored in
>> > this case I believe.
>> >
>> > You'd want something like
>> > q=src:Sprache
>> > or
>> > q=dst:Sprache
>> > or even
>> > http://localhost:8983/solr/windex/select?q=Sprache&df=src
>> > http://localhost:8983/solr/windex/select?q=Sprache&df=dst
>> >
>> > where "df" is "default field" and the search is applied against that
>> > field in the absence of a field qualification like my first two
>> > examples.
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Mar 25, 2015 at 2:52 PM, Michael Della Bitta
>> > <michael.della.bitta@appinions.com> wrote:
>> >> I agree the terminology is possibly a little confusing.
>> >>
>> >> Stored refers to values that are stored verbatim. You can retrieve them
>> >> verbatim. Analysis does not affect stored values.
>> >> Indexed values are tokenized/transformed and stored inverted. You can't
>> >> recover the literal analyzed version (at least, not easily).
>> >>
>> >> If what you really want is to store and retrieve case folded versions of
>> >> your data as well as the original, you need to use something like a
>> >> UpdateRequestProcessor, which I personally am less familiar with.
>> >>
>> >>
>> >> On Wed, Mar 25, 2015 at 5:28 PM, Martin Wunderlich <martin_wu@gmx.net>
>> >> wrote:
>> >>
>> >>> So, the pre-processing steps are applied under <analyzer type=„index“>.
>> >>> And this point is not quite clear to me: Assuming that I have a simple
>> >>> case-folding step applied to the target of the copyField: How or where
>> are
>> >>> the lower-case tokens stored, if the text isn’t added to the index?
>> How is
>> >>> the query supposed to retrieve the lower-case version?
>> >>> (sorry, if this sounds like a naive question, but I have a feeling
>> that I
>> >>> am missing something really basic here).
>> >>>
>> >>
>> >>
>> >> Michael Della Bitta
>> >>
>> >> Senior Software Engineer
>> >>
>> >> o: +1 646 532 3062
>> >>
>> >> appinions inc.
>> >>
>> >> “The Science of Influence Marketing”
>> >>
>> >> 18 East 41st Street
>> >>
>> >> New York, NY 10017
>> >>
>> >> t: @appinions <https://twitter.com/Appinions> | g+:
>> >> plus.google.com/appinions
>> >> <
>> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
>> >
>> >> w: appinions.com <http://www.appinions.com/>
>>
>>

Mime
View raw message