lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Deletion of words in articles of Wikipedia
Date Wed, 02 Sep 2009 10:08:20 GMT
>>I need to start off with this project where we can find the ranking of
>>controversial articles. Could anyone kindly help me how to start?

Check out the wikipedia "logging" dumps which contain the reasons for actions on page titles
(including ip blocks and deletes) but without the bulk of the full text changes.
e.g. http://download.wikimedia.org/enwiki/20090827/enwiki-20090827-pages-logging.xml.gz

Once you get this in Lucene "Luke" can help you explore and pinpoint the key target pages
for vandalism.


Cheers,
Mark




----- Original Message ----
From: Sahi <sahilkaushik@hotmail.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 2 September, 2009 5:09:15
Subject: Deletion of words in articles of Wikipedia


Hi,

I'm new to this site. My question is:

Articles in wikipedia can be edited by everyone and may or may not be
accurate. If any contributor writes an article and then another contributor
deletes certain content in that article would indicate that the article is
controversial. 
I need to start off with this project where we can find the ranking of
controversial articles. Could anyone kindly help me how to start?

Thanks
-- 
View this message in context: http://www.nabble.com/Deletion-of-words-in-articles-of-Wikipedia-tp25251378p25251378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message