nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Parvez <par...@gmail.com>
Subject Re: URL built by JavaScript Function - Can this be Crawled
Date Mon, 14 Sep 2009 16:35:48 GMT
Thanks ken.
If Google itself has not fully implemented, JavaScript analysis/execution
for crawling
I am going to stay away from it and look for alternate solution.

Thanks/Regards,
Parvez



On Mon, Sep 14, 2009 at 11:15 AM, Ken Krugler
<kkrugler_lists@transpac.com>wrote:

> JavaScript code that creates dynamic URLs is always a problem for web
> crawlers.
>
> Most web sites try to make their content crawlable by creating alternative
> static links to the content.
>
> I think Google now does some analysis/execution of JS code, but it's a
> tricky problem.
>
> I would suggest modifying the HTML parser to explicitly look for calls
> being made to your function, and generate appropriate outlinks.
>
> -- Ken
>
>
>
> On Sep 14, 2009, at 8:04am, Mohamed Parvez wrote:
>
>  Can anyone please through some light on this
>>
>> Thanks/Regards,
>> Parvez
>>
>>
>> On Fri, Sep 11, 2009 at 3:23 PM, Mohamed Parvez <parvez@gmail.com> wrote:
>>
>>  We have a JavaScript function, which takes some prams and builds an URL
>>> and
>>> then uses  window.location to send the user to that URL.
>>>
>>> Our website uses this feature a lot and most of the urls are built using
>>> this function.
>>>
>>> I am trying to crawl using Nutch and I am also using the parse-js plugin.
>>>
>>> But it does not look like Nautch is able to crawl these URLs.
>>>
>>> Am I doing something wrong or Nutch is not able to crawl URLs build by
>>> JavaScript function.
>>>
>>> ----
>>> Thanks/Regards,
>>> Parvez
>>>
>>>
>>>
> --------------------------
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-210-6378
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message