incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (ANY23-37) LGPL'ed components cannot be included in distribution packages
Date Sat, 25 Feb 2012 20:19:49 GMT

    [ https://issues.apache.org/jira/browse/ANY23-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216527#comment-13216527
] 

Lewis John McGibbney edited comment on ANY23-37 at 2/25/12 8:18 PM:
--------------------------------------------------------------------

OK so this patch also removes the DSIutils and fastutils libraries from the basic-crawler
pom.xml.

There will still be the problem with the compile time error. This is because getHTML() is
deprecated in the newer version of Crawler4j. 
Around lines 89-98 of Crawler.java [0], instead of making the call to page.getHTML() (line
96), we should instead be specifying something like:

{code}
if (page.getParseData() instanceof HtmlParseData) {
       HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
       String html = htmlParseData.getHtml();

       Crawler.super.performExtraction(
                       new StringDocumentSource(
                                       html,
                                       pageURL
                       )
       );
}
{code}  

I got totally sidetracked from this after last weekend so apologies about the half baked patch.
More details on this can be seen @ [1]

[0] https://svn.apache.org/viewvc/incubator/any23/trunk/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java?view=markup
[1] http://code.google.com/p/crawler4j/
                
      was (Author: lewismc):
    OK so this patch also removes the DSIutils and fastutils libraries from the basic-crawler
pom.xml.

There will still be the problem with the compile time error. This is because getHTML() is
deprecated in the newer version of Crawler4j. Around lines 89-98, we should instead be specifying
something like:

{code}
if (page.getParseData() instanceof HtmlParseData) {
       HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
       String html = htmlParseData.getHtml();

       Crawler.super.performExtraction(
                       new StringDocumentSource(
                                       html,
                                       pageURL
                       )
       );
}
{code}  

I got totally sidetracked from this after last weekend so apologies about the half baked patch
:|
                  
> LGPL'ed components cannot be included in distribution packages
> --------------------------------------------------------------
>
>                 Key: ANY23-37
>                 URL: https://issues.apache.org/jira/browse/ANY23-37
>             Project: Apache Any23
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Simone Tripodi
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: ANY23-37-v2.patch, ANY23-37.patch
>
>
> While reviewing dependencies license, I noticed that the it.unimi.dsi:dsiutils:2.0.1
transitive dependency is released under LGPL release, so it cannot be included in the non-maven
binary archives.
> A first turnaround solution could be avoiding it is included and reporting it in the
README.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message