manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CONNECTORS-1550) HTML Tag mapping
Date Fri, 19 Oct 2018 11:32:00 GMT

     [ https://issues.apache.org/jira/browse/CONNECTORS-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wright resolved CONNECTORS-1550.
-------------------------------------
    Resolution: Not A Problem

Hi [~DonaldVdD], please post questions like this to the users@manifoldcf.apache.org mailing
list.  Jira is meant for bugs and enhancement requests.  Thank you!


> HTML Tag mapping
> ----------------
>
>                 Key: CONNECTORS-1550
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1550
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Elastic Search connector, Tika extractor, Web connector
>    Affects Versions: ManifoldCF 2.10
>            Reporter: Donald Van den Driessche
>            Priority: Major
>
> I’ll be crawling a website with the standard Web connecter. I want to extract just
certain html tags like <h1>, <h2> and <p>. 
> I’ve set up an HTML extractor transformation connector and the internal Tika transformation
connector. But I can’t find any place to do a mapping to the output for this.
>  
> Do I have to write my own transformation connector to extract the content of these tags?
Or is there a built in solution?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message