nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kosiorowski <pkosiorow...@gmail.com>
Subject Nutch - new public server
Date Wed, 06 Apr 2005 20:20:39 GMT
Hello all,

I would like  to thank all nutch developers and users  for high quality 
code and support that helped us to deploy beta version of travel related 
web search engine on www.igougo.com site. Decision to base our solution 
on nutch was a perfect one - the quality of nutch code allowed us to 
build a proptotype quickly and integrate our code easily.

Search engine runs on Opteron boxes with Linux and JVM 1.5 (64-bit 
version). Web search engine is based on nutch code with some 
modifications.  Current solution uses latest patch for usage of host 
name and title in ranking. We use customized set of boosts for fields (I 
will send a separate email about it as I have promised some time ago).
We have our own implementation of WebDB - based on mysql. It was good 
for our purposes as it allowed to easily integrate classification of 
pages and additional information we needed but as we want to grow our 
index size we will have performance problems - so we will have to change 
it in near future (we are interested in map reduce implementation here).
Classification of pages during fetching was done using Support Vector 
Machines.

It is released as beta to allow users to interact with it but there is 
still a lot of work to do  especially in areas of relevancy and spam 
removal. Changes would be intoduced gradually in following months.

I will add a link to our search engine to nutch Wiki as soon as it will 
be fully transfered to Apache to avoid problems in the middle of transition.

Once again, thank you all for high quality search engine and I am 
looking forward to use nutch in future,

Regards
Piotr Kosiorowski
Senior Software Developer
Travel Search Technologies
Sabre Holdings

PS. For interested Sabre Holdings press release is here:
	http://www.forbes.com/home/feeds/ap/2005/04/05/ap1925698.html

Mime
View raw message