Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6AF1A200C13 for ; Mon, 23 Jan 2017 05:48:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 69A86160B59; Mon, 23 Jan 2017 04:48:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B3D17160B45 for ; Mon, 23 Jan 2017 05:48:06 +0100 (CET) Received: (qmail 34429 invoked by uid 500); 23 Jan 2017 04:48:05 -0000 Mailing-List: contact dev-help@camel.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@camel.apache.org Delivered-To: mailing list dev@camel.apache.org Received: (qmail 34417 invoked by uid 99); 23 Jan 2017 04:48:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jan 2017 04:48:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9868A18C40C for ; Mon, 23 Jan 2017 04:48:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.499 X-Spam-Level: X-Spam-Status: No, score=0.499 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 3-TqGtC4ws9A for ; Mon, 23 Jan 2017 04:48:03 +0000 (UTC) Received: from mail-pf0-f174.google.com (mail-pf0-f174.google.com [209.85.192.174]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 08E345F23A for ; Mon, 23 Jan 2017 04:48:03 +0000 (UTC) Received: by mail-pf0-f174.google.com with SMTP id f144so37928025pfa.2 for ; Sun, 22 Jan 2017 20:48:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:user-agent:date:subject:from:to:message-id :thread-topic:references:in-reply-to:mime-version :content-transfer-encoding; bh=drVZ8e0/V4wq/L8afeEYMgqYH7UjL3OSRQjaqDaWrgU=; b=ktT4m85TOp/4DNxHUH0W+cXpOcikbWut/iiRUiifR1fpOe1dpSpVvOHHBG8GFAY6so EmxP5EJ+NkS1dz2tfCtzUFaTtch4kzmihAIjfRzQRRh3JLX1J2uXkAl7+WIqrNNN92gX rtm0GbeQsMo34f7lPsG/mUtPsLdjJFDM8t4uRxg+7ROvDa5+UH3ytiAQbmvso5LTPvS2 5jxPqmEUiLJIDIzm4h4yyAMq+UAoGB6btH4A1LaBurgpshy93mMI6m+PCNQBM4Btj36h w1Xjnu4eHShDMce97ZakHkmmYdLpBvqyEECIQr3N0jYYpleI0xqK1NE2lFsVQlmajR8Y qjzg== X-Gm-Message-State: AIkVDXLQcXFU48CDEwMTIkc9VuWyvwhX4nTtIaRev7zOu+6J/bmFWWAPsF2NgbfNFcDlUw== X-Received: by 10.99.52.11 with SMTP id b11mr30172215pga.131.1485146872918; Sun, 22 Jan 2017 20:47:52 -0800 (PST) Received: from [128.149.80.136] ([2602:306:b802:9ac0:c506:fe1d:64e7:c67d]) by smtp.gmail.com with ESMTPSA id w65sm32798135pfw.9.2017.01.22.20.47.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Jan 2017 20:47:52 -0800 (PST) User-Agent: Microsoft-MacOutlook/f.1b.0.161010 Date: Sun, 22 Jan 2017 20:49:09 -0800 Subject: Re: Apache Tika Component From: Chris Mattmann To: Message-ID: <8DE2629C-7F89-4385-8D49-3EEBD12466CE@gmail.com> Thread-Topic: Apache Tika Component References: In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable archived-at: Mon, 23 Jan 2017 04:48:07 -0000 Great job, Bob! =E2=98=BA On 1/22/17, 8:17 PM, "Bob Paulin" wrote: Hi, =20 I'd like to propose an Apache Tika[1] connector for Apache Camel. I se= e Camel uses a number of Tika components like PDFBox but it could be interesting to have a full assortment of file parsers to convert files to text. =20 The basic configuration would allow MIME type detection and parsing files to text.=20 =20 tika:detect =20 File/Inputstream -> camel-tika -> MIME Type =20 tika:parse =20 File/Inputstream -> camel-tika -> OutputStream in text =20 I have a basic implementation that I'd be happy to send in a PR but I wanted to see if this was something the community was interested in. I think it might be interesting to combine a project that integrates everything with the project the parses everything. I also think having a camel-tika component might help achieve some of Tika's 2.0 goals. =20 =20 - Bob Paulin =20 =20 [1] https://tika.apache.org/ =20 [2] https://wiki.apache.org/tika/Tika2_0RoadMap =20 =20 =20