any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewismc <...@git.apache.org>
Subject [GitHub] any23 pull request: Initial move towards addressing ANY23-280 Refa...
Date Wed, 06 Apr 2016 19:50:17 GMT
GitHub user lewismc opened a pull request:

    https://github.com/apache/any23/pull/24

    Initial move towards addressing ANY23-280 Refactor ContentExtractor to improve extraction
flexibility

    Hi Folks,
    This is an initial crack at addressing https://issues.apache.org/jira/browse/ANY23-280
    Essentially, the main API difference is the complete removal of ```public interface ContentExtractor
extends Extractor<InputStream>``` from the Extractor interface in the api module.
    This patch has a long way to go with numerous failing tests however I wanted to post it
for feedback.
    Although Any23 still builds with -DskipTests, without that flag the failing tests are
as follows
    ```
    Results :
    
    Failed tests:
      Any23Test.testDemoCodeSnippet1:201
      Any23Test.testN3Detection1:92->assertDetection:661
      Any23Test.testN3Detection2:97->assertDetection:661
      Any23Test.testTTLDetection:87->assertDetection:661
      RoverTest.testRunMultiURLs:104->runWithMultiSourcesAndVerify:134 Unexpected number
of statements.
    Tests in error:
      Any23Test.testProgrammaticExtraction:279 » NullPointer
    CSVExtractorTest.testExtractionCommaSeparated:49->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
    CSVExtractorTest.testExtractionEmptyValue:112->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
    CSVExtractorTest.testExtractionSemicolonSeparated:64->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
    CSVExtractorTest.testExtractionTabSeparated:79->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
    CSVExtractorTest.testTypeManagement:94->AbstractExtractorTestCase.dumpModelToRDFXML:714
» Runtime
    RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
» NullPointer
    RDFaExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
» NullPointer
    Tests run: 403, Failures: 5, Errors: 8, Skipped: 11
    ```
    You will see that some of the tests concern https://issues.apache.org/jira/browse/ANY23-267
as well.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/any23 ANY23-280

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/any23/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #24
    
----
commit 801f2f93967bfd1295700223085eef3f54181517
Author: Lewis John McGibbney <lewis.j.mcgibbney@jpl.nasa.gov>
Date:   2016-04-06T19:44:35Z

    Initial move towards addressing ANY23-280 Refactor ContentExtractor to improve extraction
flexibility

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message