lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: Italian web sites
Date Wed, 24 Apr 2002 18:39:01 GMT
Laura

>Hi all,
>
>I'm using Jobo for spidering web sites and lucene for indexing. The
>problem is that I'd like spidering only Italian web sites.
>How can I see discover the country of a web site?
>
>Dou you know some method that tou can suggest me?

The best method I know is using n-grams of characters and
use the frequencies of the n-grams that occur most:
http://citeseer.nj.nec.com/context/698873/68861

Regards,
Ype

-- 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message