gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Nutch crawler issue with more depth value
Date Thu, 24 Jan 2019 08:46:26 GMT
Hi there,

Can I ask you which backend you are using?
If it is HBase, then you have update the max KeyValue size configuration.
This configuration is on the hbase-site.xml file which by default is 10MB


I am copying the Gora mailing list as well, as they might have other
alternative solutions as well.


Renato M.

El mié., 23 ene. 2019 a las 19:37, Gomathi Palanisamy (<
gpalanisamy@worldbankgroup.org>) escribió:
> Hi,
> we are using Nutch 2.3.1-src version. Executing crawl command with 200
depth. but after few iterations, Fetching fails with the below mentioned
runtime exception.
> java.lang.RuntimeException: java.lang.IllegalArgumentException: KeyValue
size too large
> Exception at GoraRecordWriter.class while writing to datastore: KeyValue
size too large
> Crawl command:
> /Data/Apache/apache-nutch-2.3.1/runtime/local/bin/crawl
/Data/Apache/apache-nutch-2.3.1/runtime/local/urls crawl-nutch
http://localhost:9200/test/ 200
> Any suggestions?
> Thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message