lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Highlight results in Arabic are backword
Date Thu, 06 Feb 2014 15:12:44 GMT
Hi Fatima,

I don’t think there’s an actual problem, it just looks like it because the program you’re
using to look at the JSON makes a different choice for laying out the highlighting results
than it does for the field values.  

In fact, all the bytes are the same, and in the same order for both the “author” field
text and the highlighting text, though some space characters are ASCII space (U+0020) in one
and non-breaking space (U+00A0) in the other.

By the way, I see the same thing as you in my email client (OS X Mail.app).  I assume there
is a rule shared by our programs about complex layout like this, where right-to-left text
is mixed with left-to-right text, likely based on the proportion of each, that triggers a
left-to-right word sequencing instead of the expected right-to-left word sequencing.

Anyway, I pulled out the author field and highlighting texts into an HTML document and viewed
it in my browser (Safari), and both are layed out the same (with the exception of the emphasis
given the highlighted word):

——
<html>
<body>
<p>"author": "د. فيشر السعر",</p>
<p>"highlighting": { "1": { "author": [ "د. <em>فيشر</em> السعر"
] } }</p>
</body>
</html>
——

Steve

On Feb 6, 2014, at 8:23 AM, Fatima Issawi <issawif@qu.edu.qa> wrote:

> Hello,
> 
> I am getting highlight results in Arabic, but the order of the words are backwards. Querying
on that field gives me the correct result, though. Is there are setting I’m missing?
> 
> An extract from an example query from my Solr Console is below:
> 
> {
>  "responseHeader": {
>    "status": 0,
>    "QTime": 1,
>    "params": {
>      "indent": "true",
>      "q": "author:\"فيشر\"",
>      "_": "1391692704242",
>      "hl.simple.pre": "<em>",
>      "hl.simple.post": "</em>",
>      "hl.fl": "author",
>      "wt": "json",
>      "hl": "true"
>    }
>  },
>  "response": {
>    "numFound": 4,
>    "start": 0,
>    "docs": [
>      {
>        "pagenumber": 1,
>        "id": "1",
>        "author": "د. فيشر السعر",
>        "author_s": "د. فيشر السعر",
>        "collector": "فاطمة عيساوي",
>  },
>  "highlighting": {
>    "1": {
>      "author": [
>        "د. <em>فيشر</em> السعر"
>      ]


Mime
View raw message