beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Beryozkin (JIRA)" <>
Subject [jira] [Commented] (BEAM-2328) Introduce Apache Tika Input component
Date Wed, 14 Jun 2017 11:10:00 GMT


Sergey Beryozkin commented on BEAM-2328:

Hi JB, All,
I'm now ready to create the initial PR. As I said earlier I realize it won't be perfect from
a start and I have some tasks to do next once PR gets accepted (making common-compress 1.14
managed, a couple of possible refactorings which would affect the outer Beam source and help
to minimize the duplication of FileBased related utility code inside the Tika component) but
for now I'm just trying to keep this initial contribution as simple as possible and also self
The only immediate question I have is how should this artifact be really named, at the moment
it is "beam-sdks-java-io-tika" but I wonder should it really be "beam-sdks-java-input-tika"
given that the output can not be supported ?


> Introduce Apache Tika Input component
> -------------------------------------
>                 Key: BEAM-2328
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
> Apache Tika is a popular project that offers an extensive support for parsing the variety
of file formats. It is used in many projects including Lucene and Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest to many users.
> PR is to follow

This message was sent by Atlassian JIRA

View raw message