nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents
Date Tue, 14 Feb 2017 19:52:41 GMT


ASF GitHub Bot commented on NIFI-2613:

Github user joewitt commented on the issue:
    @jdye64 will you have time to address @jvwing findings?  Would like to get this across
the line.  We're trying to get the list of stale PRs burned down.

> Support extracting content from Microsoft Excel (.xlxs) documents
> -----------------------------------------------------------------
>                 Key: NIFI-2613
>                 URL:
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Jeremy Dyer
>            Assignee: Jeremy Dyer
> Microsoft Excel is a wildly popular application that businesses rely heavily on to store,
visualize, and calculate data. Any single company most likely has thousands of Excel documents
containing data that could be very valuable if ingested via NiFi and combined with other datasources.
Apache POI is a popular 100% Java library for parsing several Microsoft document formats including
Excel. Apache POI is extremely flexible and can do several things. This issue would focus
solely on using Apache POI to parse an incoming .xlxs document and convert it to CSV. The
processor should be capable of limiting which excel sheets. CSV seems like the natural choice
for outputting each row since this feature is already available in Excel and feels very natural
to most Excel sheet designs.
> This capability should most likely introduce a new "poi" module as I envision many more
capabilities around parsing Microsoft documents could come from this base effort.

This message was sent by Atlassian JIRA

View raw message