any23-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Concurrent HTTP requests?
Date Mon, 02 Mar 2015 21:27:51 GMT
Hey Luca,

On Mon, Mar 2, 2015 at 1:08 PM, <user-digest-help@any23.apache.org> wrote:

>
> I'm new to using Any23, and it's already been a great library to use.
>

great


> However I'm stuck with something rather basic. I followed this example
> on how to simply GET a URL and return the triples it contains:
> http://any23.apache.org/dev-data-extraction.html
>

OK


>
> I'd like to run many HTTP requests in a non-blocking fashion,
> concurrently. Are there facilities to do this using the HTTP code
> contained in Any23?
>
> There is no code in Any23 for this. You may wish to investigate the Any23
Basic HTTP crawler plugin however
https://github.com/apache/any23/tree/master/plugins/basic-crawler
You can define the number of crawlers on the command line
https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67
As an alternative you could investigate using something like Crawler
Commons [0] or Apache Nutch [1] for dealing with the HTTP logic

[0] https://code.google.com/p/crawler-commons/
[1] http://nutch.apache.org

Mime
View raw message