beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2328) Introduce Apache Tika Input component
Date Fri, 16 Jun 2017 12:40:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051839#comment-16051839
] 

ASF GitHub Bot commented on BEAM-2328:
--------------------------------------

GitHub user sberyozkin opened a pull request:

    https://github.com/apache/beam/pull/3378

    [BEAM-2328] Add TikaIO component

    R: @jbonofre
    
    Adding TikaSource and TikaReader tests
    Updating TikaReader to use TikaInputStream as suggested by Tim Allison
    Supporting the customization of TikaConfig
    Cleanup:
    Moving a 'tika' above 'xml' in io/pom.xml to keep the correct order
    Renaming TikaInput to TikaIO, adding Read.withOptions, throwing NoSuchElementException
if the current is null
    Removing redundant test annotations
    Fixing TikaIO JavaDoc typo


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sberyozkin/beam tikaio

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3378
    
----
commit 8c63d91c0a088e2d90d5572051f736f24ea338b5
Author: Sergey Beryozkin <sberyozkin@gmail.com>
Date:   2017-05-25T15:47:59Z

    Adding TikaIO component
    Enforcing that start is called before advance
    Adding TikaSource and TikaReader tests
    Updating TikaReader to use TikaInputStream as suggested by Tim Allison
    Supporting the customization of TikaConfig
    Moving a 'tika' above 'xml' in io/pom.xml to keep the correct order
    Renaming TikaInput to TikaIO, adding Read.withOptions, throwing NoSuchElementException
if the current is null
    Removing redundant test annotations
    Fixing TikaIO JavaDoc typo

----


> Introduce Apache Tika Input component
> -------------------------------------
>
>                 Key: BEAM-2328
>                 URL: https://issues.apache.org/jira/browse/BEAM-2328
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
>
>
> Apache Tika is a popular project that offers an extensive support for parsing the variety
of file formats. It is used in many projects including Lucene and Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest to many users.
> PR is to follow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message