lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Bowen" <davidlbo...@gmail.com>
Subject Re: highlighting and stemming
Date Tue, 23 Dec 2008 18:36:00 GMT
I've filed a ticket on this so it doesn't get lost:

https://issues.apache.org/jira/browse/SOLR-937

On Mon, Dec 22, 2008 at 11:53 AM, David Bowen <davidlbowen@gmail.com> wrote:

> Yonik, thanks for looking into this.
>
> Here is a better example of the problem, using the example data from the
> latest dev version.  Add the words "electronics" and "connector" to the
> features field of the first doc in ipod_other.xml.  Now the following query:
>
> http://localhost:8983/solr/select/?q=electronics&hl=true&hl.fl=features+cat
>
> will show "electronics" highlighted in the features field but not in the
> cat field.  If you search instead for "connector", it is highlighted in
> both.
>
>
> On Sun, Dec 21, 2008 at 8:53 PM, Yonik Seeley <yseeley@gmail.com> wrote:
>
>> On Fri, Dec 19, 2008 at 8:44 PM, David Bowen <davidlbowen@gmail.com>
>> wrote:
>> > We have two text fields, one for author names, and the other for the
>> body of
>> > the document.  It often happens that the author names also appear in the
>> > body of the document.  We turned off stemming for the author field to
>> avoid
>> > unexpected matches when searching by author.
>> >
>> > Now, suppose we have an author named "Joe Bloggs" whose name appears in
>> both
>> > the fields.  If the user searches for him by author, we get correct
>> > highlighting in the author field, but only "Joe" and not "Bloggs" is
>> > highlighted in the main body field.  Conversely, if the user searches
>> for
>> > "Joe Bloggs" in the main body field, the highlighting is correct in that
>> > field but this time only "Joe" is highlighted in the author field.
>> >
>> > Any suggestions on how we could make this work as we expected (name
>> properly
>> > highlighted in both fields)? Is it a bug that the query isn't
>> re-tokenized
>> > when highlighting a field that has different tokenization specified than
>> was
>> > used for the search?
>>
>> That's not the problem (or at least it's not a general problem).  I
>> just tried this with the example (and latest dev version of Solr)
>> using the "cat" field (unstemmed) and the "features" field (stemmed),
>> and both were highlighted at the same time as expected.  I even put in
>> "Joe Bloggs", verified that it was searching for "Bloggs" in the cat
>> field and "blogg" in the features field.
>>
>> -Yonik
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message