lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Frank Hansen (MHQ)" <...@kmd.dk>
Subject Tesseract language
Date Thu, 18 Oct 2018 11:30:12 GMT
Hi,

I have been trying to use Tesseract through the data-import-handler in Solr and it actually
works very well – with English. As the documents are  in Danish, I need to change the language
setting in Tesseract to Danish as well, is that possible from Solr?

I was using the update/extract-handler to import single files into Solr, and it worked for
a single file, how would I implement several files from a file-system?

Here is the request-handler I used:

<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">false</str>
      <str name="uprefix">ignored_</str>
      <str name="captureAttr">true</str>
    </lst>
  </requestHandler>


Martin Frank Hansen, Senior Data Analytiker

Data, IM & Analytics

[cid:image001.png@01D383C9.6C129A60]

Lautrupparken 40-42, DK-2750 Ballerup
E-mail mhq@kmd.dk<mailto:mhq@kmd.dk>  Web www.kmd.dk<http://www.kmd.dk/>
Mobil +4525571418


Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>,
der fortæller, hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy>
outlining how we process your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en
fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge
svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller
kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages
og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab
og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have received this
message by mistake, please inform the sender of the mistake by sending a reply, then delete
the message from your system without making, distributing or retaining any copies of it. Although
we believe that the message and any attachments are free from viruses and other errors that
might affect the computer or it-system where it is received and read, the recipient opens
the message at his or her own risk. We assume no responsibility for any loss or damage arising
from the receipt or use of this message.
Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message