[ https://issues.apache.org/jira/browse/STREAMS-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309171#comment-15309171
]
Steve Blackmon commented on STREAMS-51:
---------------------------------------
a semi-working version of this module exists in a branch and is worth wrapping up and merging,
IMO. it can scrape HTML or other tika doc types to turn a URL into a structured document,
then populate actor.name, content, published, and other streams fields based on what's in
the page.
> Complete, test, and document tika processor
> -------------------------------------------
>
> Key: STREAMS-51
> URL: https://issues.apache.org/jira/browse/STREAMS-51
> Project: Streams
> Issue Type: Story
> Reporter: Steve Blackmon
>
> Complete, test, and document tika processor
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|