nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Betancourt <>
Subject Re: Injection from webservice
Date Mon, 16 Sep 2019 17:14:36 GMT
Hi Roannel,

The current implementation of the injector only accepts a path (actually an
org.apache.hadoop.fs.Path) this means that there is no way to feed an URL
directly unless you download the content first.

If you use the REST API you can send the seed file using the API endpoint.
Otherwise, you could write your own injector with the proper logic to deal
with a list of URLs coming from an URL.

The REST API implementation just writes the content in the expected format (

Best Regards,

On Mon, Sep 16, 2019 at 4:59 PM Roannel Fernandez Hernandez <>

> Hi folks,
> Is there any way in Nutch 1.15 to inject a remote seed file (accessible
> via http or https)?
> I mean this, for instance:
> bin/nutch inject crawl
> Regards
> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
> Por La Habana, lo más grande. #Habana500 #UCIxHabana500

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message