lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric casteleijn (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5759) increasing hl.fragsize loses part of the search term
Date Fri, 21 Feb 2014 01:20:22 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

eric casteleijn updated SOLR-5759:
----------------------------------

    Description: 
When using the highlighter, and increasing the fragsize from 100 (the default) to 200, sometimes
the search term is no longer entirely contained by the returned fragment, even though it was
in the smaller snippet.

For instance:

http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=100

results in the fragment:

"7618861":{
      "extracted_text":[" enterprise forward.\n\n<em>Tony</em> <em>Yet</em>,
one of the centre's organisers, explains: \"I think what Hong Kong needs"]},

whereas:

http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=200

results in:

"7618861":{
      "extracted_text":[" interested in social issues, as well as mentorship for upcoming
enterprises.\n\nAs in the UK, it is also creating the community of people, skills and ideas
that is needed to push social enterprise forward.\n\n<em>Tony</em>"]},

Both reference roughly the same position from the same field, but I can't for the life of
me imagine why the larger fragment would shift to the left so far as to drop half of the search
term.

If desirable, I can upload the entire json results for both requests.

Let me know if there is any other information I can supply, or checks I can perform.

  was:
yWhen using the highlighter, and increasing the fragsize from 100 (the default) to 200, sometimes
the search term is no longer entirely contained by the returned fragment, even though it was
in the smaller snippet.

For instance:

http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=100

results in the fragment:

"7618861":{
      "extracted_text":[" enterprise forward.\n\n<em>Tony</em> <em>Yet</em>,
one of the centre's organisers, explains: \"I think what Hong Kong needs"]},

whereas:

http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=200

results in:

"7618861":{
      "extracted_text":[" interested in social issues, as well as mentorship for upcoming
enterprises.\n\nAs in the UK, it is also creating the community of people, skills and ideas
that is needed to push social enterprise forward.\n\n<em>Tony</em>"]},

Both reference roughly the same position from the same field, but I can't for the life of
me imagine why the larger fragment would shift to the left so far as to drop half of the search
term.

If desirable, I can upload the entire json results for both requests.

Let me know if there is any other information I can supply, or checks I can perform.


> increasing hl.fragsize loses part of the search term
> ----------------------------------------------------
>
>                 Key: SOLR-5759
>                 URL: https://issues.apache.org/jira/browse/SOLR-5759
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.4
>         Environment: Ubuntu 12.04
>            Reporter: eric casteleijn
>
> When using the highlighter, and increasing the fragsize from 100 (the default) to 200,
sometimes the search term is no longer entirely contained by the returned fragment, even though
it was in the smaller snippet.
> For instance:
> http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=100
> results in the fragment:
> "7618861":{
>       "extracted_text":[" enterprise forward.\n\n<em>Tony</em> <em>Yet</em>,
one of the centre's organisers, explains: \"I think what Hong Kong needs"]},
> whereas:
> http://host/solr/index/select?q=("Tony+Yet"+AND+exact_text:"Tony+Yet")&wt=json&indent=true&hl=true&hl.fl=title,summary,extracted_text&hl.simple.pre=<em>&hl.simple.post=</em>&hl.fragsize=200
> results in:
> "7618861":{
>       "extracted_text":[" interested in social issues, as well as mentorship for upcoming
enterprises.\n\nAs in the UK, it is also creating the community of people, skills and ideas
that is needed to push social enterprise forward.\n\n<em>Tony</em>"]},
> Both reference roughly the same position from the same field, but I can't for the life
of me imagine why the larger fragment would shift to the left so far as to drop half of the
search term.
> If desirable, I can upload the entire json results for both requests.
> Let me know if there is any other information I can supply, or checks I can perform.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message