manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donald Van den Driessche (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1557) HTML Tag extractor
Date Wed, 21 Nov 2018 08:53:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694408#comment-16694408
] 

Donald Van den Driessche commented on CONNECTORS-1557:
------------------------------------------------------

Hi

Thank you for your e-mail. I'll be out of the office until July 12.
I have limited access to my e-mail. Your message will not be forwarded.

If you need urgent assistance, please contact Ken Mampaey (ken.mampaey@formica.digital) or
Tom De Bruyn (tom.debruyn@formica.digital).

Best regards
Donald Van den Driessche


> HTML Tag extractor
> ------------------
>
>                 Key: CONNECTORS-1557
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1557
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Donald Van den Driessche
>            Assignee: Karl Wright
>            Priority: Major
>
> I wrote a HTML Tag extractor, based on the HTML Extractor.
> I needed to extract specific HTML tags and transfer them to their own field in my output
repository.
> Input
>  * Englobing tag (CSS selector)
>  * Blacklist (CSS selector)
>  * Fieldmapping (CSS selector)
>  * Strip HTML
> Process
>  * Retrieve Englobing tag
>  * Remove blacklist
>  * Map selected CSS selectors in Fieldmapping (arrays if multiple finds) + strip HTML
(if requested)
>  * Englobing tag minus blacklist: strip HTML (if requested) and return as output (content)
> How can I best deliver the source code?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message