From issues-return-20336-apmail-nifi-issues-archive=nifi.apache.org@nifi.apache.org Tue Feb 14 19:52:45 2017 Return-Path: X-Original-To: apmail-nifi-issues-archive@minotaur.apache.org Delivered-To: apmail-nifi-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B96E919007 for ; Tue, 14 Feb 2017 19:52:45 +0000 (UTC) Received: (qmail 10646 invoked by uid 500); 14 Feb 2017 19:52:45 -0000 Delivered-To: apmail-nifi-issues-archive@nifi.apache.org Received: (qmail 10555 invoked by uid 500); 14 Feb 2017 19:52:45 -0000 Mailing-List: contact issues-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list issues@nifi.apache.org Received: (qmail 10540 invoked by uid 99); 14 Feb 2017 19:52:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2017 19:52:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 262EFC0266 for ; Tue, 14 Feb 2017 19:52:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.198 X-Spam-Level: X-Spam-Status: No, score=-1.198 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 4Y4ofacDFHZE for ; Tue, 14 Feb 2017 19:52:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id E5C315FB0F for ; Tue, 14 Feb 2017 19:52:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5219FE0783 for ; Tue, 14 Feb 2017 19:52:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B8DF724127 for ; Tue, 14 Feb 2017 19:52:41 +0000 (UTC) Date: Tue, 14 Feb 2017 19:52:41 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/NIFI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866508#comment-15866508 ] ASF GitHub Bot commented on NIFI-2613: -------------------------------------- Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/929 @jdye64 will you have time to address @jvwing findings? Would like to get this across the line. We're trying to get the list of stale PRs burned down. > Support extracting content from Microsoft Excel (.xlxs) documents > ----------------------------------------------------------------- > > Key: NIFI-2613 > URL: https://issues.apache.org/jira/browse/NIFI-2613 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Jeremy Dyer > Assignee: Jeremy Dyer > > Microsoft Excel is a wildly popular application that businesses rely heavily on to store, visualize, and calculate data. Any single company most likely has thousands of Excel documents containing data that could be very valuable if ingested via NiFi and combined with other datasources. Apache POI is a popular 100% Java library for parsing several Microsoft document formats including Excel. Apache POI is extremely flexible and can do several things. This issue would focus solely on using Apache POI to parse an incoming .xlxs document and convert it to CSV. The processor should be capable of limiting which excel sheets. CSV seems like the natural choice for outputting each row since this feature is already available in Excel and feels very natural to most Excel sheet designs. > This capability should most likely introduce a new "poi" module as I envision many more capabilities around parsing Microsoft documents could come from this base effort. -- This message was sent by Atlassian JIRA (v6.3.15#6346)