any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Created] (ANY23-280) Restructure ContentExtractor to improve extraction flexibility
Date Sat, 02 Apr 2016 20:01:25 GMT
Lewis John McGibbney created ANY23-280:

             Summary: Restructure ContentExtractor to improve extraction flexibility
                 Key: ANY23-280
             Project: Apache Any23
          Issue Type: Improvement
          Components: core, extractors
    Affects Versions: 1.1
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
            Priority: Critical
             Fix For: 1.2

As discussed on ANY23-247, the [ContentExtractor|]
is simply not fit for purpose. This issue was discovered and the cause has plagued our builds
ever since. Any extractors which implement [BaseRDFExtractor|]
are based on the Extractor.ContentExtractor and hence work off of an 'unfixed' raw data stream
as oppose to a more flexible model such as the [TagSoupDOMExtractor].
This issue should restructure RDF extractors to enable more flexibility and to avoid issues
we encounter with the strict SAX parsing logic.

This message was sent by Atlassian JIRA

View raw message