nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <>
Subject Anyone using the 2.X REST API to retrieve crawl results as JSON
Date Tue, 10 Jul 2012 21:42:00 GMT

I am looking to create a dataset for use in an example scenario where
I want to create all the products you would typically find in the
online Amazon store e.g. loads of products with different categories,
different prices, titles, availability, condition etc etc etc. One way
I was thinking of doing this was using the above API written into
Nutch 2.X to get the results as JSON these could then hopefully be
loaded into my product table in my datastore and we could begin to
build up the database of products.

Having never used the REST API directly I wonder if anyone has any
information on this and whether I can obtain some direction relating
to producing my crawl results as JSON. I'm also going to look into
Andrzej's patch in NUTCH-932 also so I'll try to update this thread
once I make some progress with it.

Thanks in advance for any sharing of experiences with this one.



View raw message