nutch-agent mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniele Menozzi <me...@ngi.it>
Subject Pages/s rate decreasing
Date Mon, 26 Sep 2005 16:23:56 GMT
Hi all, I'trying to fetch some million of pages,but I've got some
performance problems.
I'm using a P4 1700, 768MB ram, and a 10Mb connection.
I've changed theese configuration values in nuke-sites.xml:

<property>
  <name>fetcher.threads.fetch</name>
  <value>25</value>
</property>

<property>
  <name>http.max.delays</name>
  <value>1</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1</value>
</property>

<property>
  <name>io.sort.factor</name>
  <value>10</value>
</property>

<property>
  <name>io.sort.mb</name>
  <value>1</value>
</property>

<property>
  <name>indexer.maxMergeDocs</name>
  <value>20</value>
</property>

<property>
  <name>indexer.termIndexInterval</name>
  <value>64</value>
</property>

and I've also added the following line into bin/nutch:
JAVA_HEAP_MAX=-Xmx750M

It seems a good configuration. So, I give the fetch command, I get theese log messages:

050926 181531 status: segment 20050924151836, 100 pages, 11 errors, 1277608 bytes, 11755 ms
050926 181531 status: 8.507018 pages/s, 849.11206 kb/s, 12776.08 bytes/page
050926 181537 status: segment 20050924151836, 200 pages, 17 errors, 2620277 bytes, 18157 ms
050926 181537 status: 11.015036 pages/s, 1127.4392 kb/s, 13101.385 bytes/page
050926 181548 status: segment 20050924151836, 300 pages, 26 errors, 4243689 bytes, 28657 ms
050926 181548 status: 10.468647 pages/s, 1156.9187 kb/s, 14145.63 bytes/page
050926 181557 status: segment 20050924151836, 400 pages, 32 errors, 5515098 bytes, 38102 ms
050926 181557 status: 10.4981365 pages/s, 1130.8252 kb/s, 13787.745 bytes/page
050926 181607 status: segment 20050924151836, 500 pages, 44 errors, 6678319 bytes, 48464 ms
050926 181607 status: 10.3169365 pages/s, 1076.5592 kb/s, 13356.638 bytes/page

but,after some thousand of pages, rates decrease constantly:

050926 180746 status: segment 20050924151836, 6400 pages, 566 errors,85809551 bytes, 853401
ms
050926 180746 status: 7.4994054 pages/s, 785.5476 kb/s, 13407.742 bytes/page
050926 180807 status: segment 20050924151836, 6500 pages, 581 errors,87133135 bytes, 874799
ms
050926 180807 status: 7.4302783 pages/s, 778.1532 kb/s, 13405.098 bytes/page
050926 180823 status: segment 20050924151836, 6600 pages, 589 errors, 88789053 bytes, 890686
ms
050926 180823 status: 7.410019 pages/s, 778.79803 kb/s, 13452.888 bytes/page
050926 180841 status: segment 20050924151836, 6700 pages, 594 errors, 90286731 bytes, 908720
ms
050926 180841 status: 7.3730083 pages/s, 776.21826 kb/s, 13475.631 bytes/page
050926 180901 status: segment 20050924151836, 6800 pages, 601 errors, 91663461 bytes, 928498
ms
050926 180901 status: 7.323656 pages/s, 771.268 kb/s, 13479.921 bytes/page
050926 181014 status: segment 20050924151836, 7200 pages, 627 errors,96922711 bytes, 1001732
ms
050926 181014 status: 7.187551 pages/s, 755.8995 kb/s, 13461.487 bytes/page
050926 181037 status: segment 20050924151836, 7300 pages, 637 errors, 98478215 bytes, 1024844
ms
050926 181037 status: 7.1230354 pages/s, 750.7104 kb/s, 13490.167 bytes/page


and I cannot understand how to get a fixed 10pages/s rate (or even a higher one!!). I've read
this pages
http://wiki.apache.org/nutch/HardwareRequirements
and it states that is possible, with 25 fetchers, to download (more or less) at 4Mbit per
second,
with hardware similar to mine.
So, how can I set up nutch to fetch at a higher rate??


Thank you so much!!!!!
	Menoz


-- 
		      Free Software Enthusiast
		 Debian Powered Linux User #332564 
		     http://menoz.homelinux.org

Mime
View raw message