beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Beryozkin (JIRA)" <>
Subject [jira] [Comment Edited] (BEAM-2328) Introduce Apache Tika Input component
Date Thu, 01 Jun 2017 13:06:04 GMT


Sergey Beryozkin edited comment on BEAM-2328 at 6/1/17 1:05 PM:

Sorry, Tika already reports the characters, I got confused for a moment that the default output
coder was not used there but of course that output coder is for converting String to the output...
As far as Tika is concerned it is already possible to pass the custom Metadata to TikaInput.Read,
I'll just update that to also accept TikaConfg 

was (Author: sergey_beryozkin):
Sorry, Tika already reports the characters...

> Introduce Apache Tika Input component
> -------------------------------------
>                 Key: BEAM-2328
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
> Apache Tika is a popular project that offers an extensive support for parsing the variety
of file formats. It is used in many projects including Lucene and Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest to many users.
> PR is to follow

This message was sent by Atlassian JIRA

View raw message