httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abu Hurayrah <abu_huray...@almaghrib.org>
Subject Re: [users@httpd] MSN-Bot doesn't finish a download?
Date Tue, 03 May 2005 09:43:57 GMT
Thank you for your reply!

Joshua Slive wrote:

>On 5/2/05, Abu Hurayrah <abu_hurayrah@almaghrib.org> wrote:
>  
>
>>Greets to all!
>>
>>I've just noticed that when the MSNbot crawls my website and hits some of my
>>downloads, it doesn't download the whole file.
>>    
>>
>
>Most search engines are only interested in the first x bytes of the
>file, so the bot may simply be dropping the connection after it gets
>what it want.s
>  
>
That's a fair enough assumption, however, the size of the chunk that is 
downloaded is ALWAYS the same size as my $chunk_size value in my 
download script.

>  
>
>>MSN seems to only catch ONE chunk, no matter what size I make it, which I
>>find very strange, because I cannot think of why my implementation would
>>matter to MSN or not.
>>    
>>
>
>What is the smallest "chunk" size you have tried?  You are probably
>just not detecting the dropped connection until you have sent a chunk,
>so you don't really knwo what the bot is accepting.
>
>Joshua.
>  
>
I've tried sizes ranging from 50,000 bytes to 500,000 bytes, and always, 
MSN gets only that much.

Previously, MSN would download the ENTIRE files, when I was sending 
these files all at once.  I cannot understand the mechanism that 
prevents it from continuing downloading the entire file, despite the 
fact that I partition the download into these discrete chunks.  I am not 
mangling the data in any way I know, I am simply sending it down in 
chunks to reduce the memory footprint of each of my download script's 
instances.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message