oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@gmail.com>
Subject Re: CAS Crawler Crawling Code
Date Thu, 01 May 2014 18:38:56 GMT
Hey Lewis,

That's b/c Crawler doesn't do HTTP connections.
PushPull is the component where that occurs. We
specifically made Crawler only handle local data,
and refactored the protocol layer/functionality
into PushPull and they operate through a shared
directory structure for a 'staging' dir and through
Crawler pre conditions and Actions.

Scope out Push Pull and then we can discuss.

Thanks dude.

Cheers,
Chris

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
Reply-To: <user@oodt.apache.org>
Date: Thursday, May 1, 2014 10:35 AM
To: <user@oodt.apache.org>
Subject: CAS Crawler Crawling Code

>Hi Folks,
>Im sitting jumping between ProductCrawler and StdIngester trying to pin
>point _exactly_ where product fetching actually happens.
>I'm aware of the triple headed nature of crawler workflows e.g.
>preIngestion, postIngestionSuccess and postIngestionFailure... I can see
>the logic within the ProductCrawler code... what I cannot locate is where
>HTTP/transport socket connections are created and used.
>
>Can anyone please point this out?
>Thanks
>Lewis 



Mime
View raw message