manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donald Van den Driessche (JIRA)" <>
Subject [jira] [Created] (CONNECTORS-1550) HTML Tag mapping
Date Fri, 19 Oct 2018 11:26:00 GMT
Donald Van den Driessche created CONNECTORS-1550:

             Summary: HTML Tag mapping
                 Key: CONNECTORS-1550
             Project: ManifoldCF
          Issue Type: Wish
          Components: Elastic Search connector, Tika extractor, Web connector
    Affects Versions: ManifoldCF 2.10
            Reporter: Donald Van den Driessche

I’ll be crawling a website with the standard Web connecter. I want to extract just certain
html tags like <h1>, <h2> and <p>. 
I’ve set up an HTML extractor transformation connector and the internal Tika transformation
connector. But I can’t find any place to do a mapping to the output for this.
Do I have to write my own transformation connector to extract the content of these tags? Or
is there a built in solution?

This message was sent by Atlassian JIRA

View raw message