nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nana Pandiawan <nana.pandia...@solusi247.com.INVALID>
Subject Re: Remove Header from content
Date Mon, 04 Jul 2016 13:14:13 GMT
hi markus,
I want to try apache nutch 1.12, but i got the following error when 
index the data to apache solr 5.5.2.

    /16/07/04 19:49:53 INFO mapreduce.Job: Task Id :
    attempt_1467576953324_0090_r_000000_0, Status : FAILED//
    //Error: Bad return type//
    //Exception Details://
    //  Location://
    //org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
    @58: areturn//
    //  Reason://
    //    Type 'org/apache/http/impl/client/DefaultHttpClient' (current
    frame, stack[0]) is not assignable to
    'org/apache/http/impl/client/CloseableHttpClient' (from method
    signature)//
    //  Current Frame://
    //    bci: @58//
    //    flags: { }//
    //    locals: { 'org/apache/solr/common/params/SolrParams',
    'org/apache/http/conn/ClientConnectionManager',
    'org/apache/solr/common/params/ModifiableSolrParams',
    'org/apache/http/impl/client/DefaultHttpClient' }//
    //    stack: { 'org/apache/http/impl/client/DefaultHttpClient' }//
    //  Bytecode://
    //    0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601//
    //    0000010: 0099 001e b200 05bb 0007 59b7 0008 1209//
    //    0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b//
    //    0000030: b800 104e 2d2c b800 0f2d b0 //
    //  Stackmap Table://
    //    append_frame(@47,Object[#143])/


What should I do, please help.
regards

On 04/07/16 16:37, Markus Jelsma wrote:
> Hello - there is no Boilerpipe support for 2.x.
> Markus
>
>   
>   
> -----Original message-----
>> From:Nana Pandiawan <nana.pandiawan@solusi247.com.INVALID>
>> Sent: Monday 4th July 2016 6:16
>> To: user@nutch.apache.org
>> Subject: Re: Remove Header from content
>>
>> Hi Markus Jelsma,
>>
>> If Boilerpipe support for Apache Nutch 2.3.1? i have try
>> https://issues.apache.org/jira/secure/attachment/12708817/nutch-2.x-boilerpipe.patch,
>> but doesnt work.
>>
>> regards
>>
>> On 29/06/16 17:06, Markus Jelsma wrote:
>>> Manish - you're in luck. Nutch 1.12 was released and has Boilerpipe support.
Check:
>>> https://issues.apache.org/jira/browse/NUTCH-961
>>>
>>> Markus
>>>
>>>    
>>>    
>>> -----Original message-----
>>>> From:Manish Verma <m_verma@apple.com>
>>>> Sent: Tuesday 28th June 2016 23:46
>>>> To: user@nutch.apache.org
>>>> Subject: Remove Header from content
>>>>
>>>> Hi,
>>>>
>>>> I don’t want to index header and footer of content , I know we can make
changes in HtmlParser.java but I don’t want to change nutch core code, is there any other
way(plugin) to eleminate Header div from content.
>>>>
>>>> Thanks MV
>>>>
>>>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message