lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Re: Reading data using Tika to Solr
Date Fri, 26 Oct 2018 10:48:25 GMT
IIRC, somewhere btwn 1.14 and now (1.19.1), we changed the default behavior
for the AutoDetectParser from skip attachments to include attachments.

So, two options: 1) upgrade to 1.19.1 and use the AutoDetectParser or 2)
pass an AutoDetectParser via the ParseContext to be used for attachments.

If you’re wondering why you might upgrade to 1.19.1, look no further than:
https://tika.apache.org/security.html



On Fri, Oct 26, 2018 at 4:14 AM Martin Frank Hansen (MHQ) <MHQ@kmd.dk>
wrote:

> Hi Tim,
>
> It is msg files and I added tika-app-1.14.jar to the build path - and now
> it works 😊 But how do I get it to read the attachments as well?
>
> -----Original Message-----
> From: Tim Allison <tallison@apache.org>
> Sent: 25. oktober 2018 21:57
> To: solr-user@lucene.apache.org
> Subject: Re: Reading data using Tika to Solr
>
> If you’re processing actual msg (not eml), you’ll also need poi and
> poi-scratchpad and their dependencies, but then those msgs could have
> attachments, at which point, you may as just add tika-app. :D
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message