nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: URL built by JavaScript Function - Can this be Crawled
Date Mon, 14 Sep 2009 16:15:33 GMT
JavaScript code that creates dynamic URLs is always a problem for web  
crawlers.

Most web sites try to make their content crawlable by creating  
alternative static links to the content.

I think Google now does some analysis/execution of JS code, but it's a  
tricky problem.

I would suggest modifying the HTML parser to explicitly look for calls  
being made to your function, and generate appropriate outlinks.

-- Ken


On Sep 14, 2009, at 8:04am, Mohamed Parvez wrote:

> Can anyone please through some light on this
>
> Thanks/Regards,
> Parvez
>
>
> On Fri, Sep 11, 2009 at 3:23 PM, Mohamed Parvez <parvez@gmail.com>  
> wrote:
>
>> We have a JavaScript function, which takes some prams and builds an  
>> URL and
>> then uses  window.location to send the user to that URL.
>>
>> Our website uses this feature a lot and most of the urls are built  
>> using
>> this function.
>>
>> I am trying to crawl using Nutch and I am also using the parse-js  
>> plugin.
>>
>> But it does not look like Nautch is able to crawl these URLs.
>>
>> Am I doing something wrong or Nutch is not able to crawl URLs build  
>> by
>> JavaScript function.
>>
>> ----
>> Thanks/Regards,
>> Parvez
>>
>>

--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378


Mime
View raw message