lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Retro <holste...@mail.ru.INVALID>
Subject Re: regarding Extracting text from Images
Date Wed, 22 Jan 2020 08:37:32 GMT
Good day,
We solved the situation. Here is what was used and changed:
In our installation we used Tesseract  version 3.05, Tika version 1.17, SOLR
version 7.4.  We actually, had TIKA version 1.17, not 18. 
1. Changed from HOCR to TXT  >>>   <property name="outputType" value="TXT"/>
 
in file parseContext.xml
2. Had to start SOLR as a root user.
Version 4.1.1 is not compatible with TIKA 1.17 , so we will upgrade SOLR to
version 7.7, TIKA version 1.19 and will try to install Tesseract 4.1.1
<https://lucene.472066.n3.nabble.com/file/t495209/Capture.png> 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message