lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Unicode Character Problem
Date Mon, 12 Dec 2016 15:54:41 GMT
Hi Ahmet,

I don't see any weird character when I manual copy it to any text editor.

On Sat, Dec 10, 2016 at 6:19 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi Furkan,
>
> I am pretty sure this is a pdf extraction thing.
> Turkish characters caused us trouble in the past during extracting text
> from pdf files.
> You can confirm by performing manual copy-paste from original pdf file.
>
> Ahmet
>
>
> On Friday, December 9, 2016 8:44 PM, Furkan KAMACI <furkankamaci@gmail.com>
> wrote:
> Hi,
>
> I'm trying to index Turkish characters. These are what I see at my index (I
> see both of them at different places of my content):
>
> aç �klama
> açıklama
>
> These are same words but indexed different (same weird character at first
> one). I see that there is not a weird character when I check the original
> PDF file.
>
> What do you think about it. Is it related to Solr or Tika?
>
> PS: I use text_general for analyser of content field.
>
> Kind Regards,
> Furkan KAMACI
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message