nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <>
Subject FW: Fetcher, ParseText, ParseData - need to modify
Date Mon, 15 Aug 2005 17:26:48 GMT
1. This is part of ParseText:
Any Accessories Backup Devices & Media Barebone Systems Camcorder
Accessories Camcorders Cases & External Enclosures CD / DVD Drives &
Media Cooling Devices Digital Camera Accessories Digital Cameras

- it is content of Dropdown, <OPTIONS> in HTML

2. I have some sub-text in ParseText which seems to be an anchor, I
compared visually with web-page...

-----Original Message-----
From: Fuad Efendi [] 
Sent: Monday, August 15, 2005 1:20 PM
Subject: Fetcher, ParseText, ParseData - need to modify

I just catched some output from Fetcher.FetcherThread.outputPage(.) and
noticed that some anchors are in a text, and some <OPTIONS> tags within
a text too.
"ParseText = "+text);
"ParseData = "+ parseData);

I'd like to modify behaviour, ParseText should contain subset of a text
which I need, and ParseData should contain all anchors.

Where to start? Would be nice to have plugins modifying Fetcher

View raw message