Return-Path: X-Original-To: apmail-any23-dev-archive@www.apache.org Delivered-To: apmail-any23-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84C48197B8 for ; Wed, 6 Apr 2016 19:50:18 +0000 (UTC) Received: (qmail 32210 invoked by uid 500); 6 Apr 2016 19:50:18 -0000 Delivered-To: apmail-any23-dev-archive@any23.apache.org Received: (qmail 32153 invoked by uid 500); 6 Apr 2016 19:50:18 -0000 Mailing-List: contact dev-help@any23.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@any23.apache.org Delivered-To: mailing list dev@any23.apache.org Received: (qmail 32141 invoked by uid 99); 6 Apr 2016 19:50:18 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Apr 2016 19:50:17 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D9EC3DFE2E; Wed, 6 Apr 2016 19:50:17 +0000 (UTC) From: lewismc To: dev@any23.apache.org Reply-To: dev@any23.apache.org Message-ID: Subject: [GitHub] any23 pull request: Initial move towards addressing ANY23-280 Refa... Content-Type: text/plain Date: Wed, 6 Apr 2016 19:50:17 +0000 (UTC) GitHub user lewismc opened a pull request: https://github.com/apache/any23/pull/24 Initial move towards addressing ANY23-280 Refactor ContentExtractor to improve extraction flexibility Hi Folks, This is an initial crack at addressing https://issues.apache.org/jira/browse/ANY23-280 Essentially, the main API difference is the complete removal of ```public interface ContentExtractor extends Extractor``` from the Extractor interface in the api module. This patch has a long way to go with numerous failing tests however I wanted to post it for feedback. Although Any23 still builds with -DskipTests, without that flag the failing tests are as follows ``` Results : Failed tests: Any23Test.testDemoCodeSnippet1:201 Any23Test.testN3Detection1:92->assertDetection:661 Any23Test.testN3Detection2:97->assertDetection:661 Any23Test.testTTLDetection:87->assertDetection:661 RoverTest.testRunMultiURLs:104->runWithMultiSourcesAndVerify:134 Unexpected number of statements. Tests in error: Any23Test.testProgrammaticExtraction:279 » NullPointer CSVExtractorTest.testExtractionCommaSeparated:49->AbstractExtractorTestCase.dumpModelToRDFXML:714 » Runtime CSVExtractorTest.testExtractionEmptyValue:112->AbstractExtractorTestCase.dumpModelToRDFXML:714 » Runtime CSVExtractorTest.testExtractionSemicolonSeparated:64->AbstractExtractorTestCase.dumpModelToRDFXML:714 » Runtime CSVExtractorTest.testExtractionTabSeparated:79->AbstractExtractorTestCase.dumpModelToRDFXML:714 » Runtime CSVExtractorTest.testTypeManagement:94->AbstractExtractorTestCase.dumpModelToRDFXML:714 » Runtime RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185 » NullPointer RDFaExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185 » NullPointer Tests run: 403, Failures: 5, Errors: 8, Skipped: 11 ``` You will see that some of the tests concern https://issues.apache.org/jira/browse/ANY23-267 as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/any23 ANY23-280 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/24.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #24 ---- commit 801f2f93967bfd1295700223085eef3f54181517 Author: Lewis John McGibbney Date: 2016-04-06T19:44:35Z Initial move towards addressing ANY23-280 Refactor ContentExtractor to improve extraction flexibility ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---