incubator-connectors-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wunderlich, Tobias" <tobias.wunderl...@igd-r.fraunhofer.de>
Subject Re: Indexing Wikipedia/MediaWiki
Date Mon, 19 Sep 2011 09:09:56 GMT
Hey Karl,

I did some research and the WikiMedia-API looks promising:

- There needs to be some notion of an overall list of pages:
	- http://www.mediawiki.org/wiki/API:Allpages
	- Example: http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Kre&aplimit=5

- Metadata information (author and pub date) also needs to be separated out in some way:
	- http://www.mediawiki.org/wiki/API:Properties#Revisions:_Example
	- Example:  http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main%20Page&rvprop=timestamp|user|comment|content

What do you think?

Tobias



-----Urspr√ľngliche Nachricht-----
Von: Karl Wright [mailto:daddywri@gmail.com] 
Gesendet: Freitag, 16. September 2011 16:11
An: Sumana Harihareswara
Cc: Wunderlich, Tobias
Betreff: Re: MediaWiki & Lucene development

The lucene-search extension may or may not be appropriate for Tobias.
But my interest would extend towards getting wiki content into whatever target a ManifoldCF
sets up, not just Solr/Lucene.  In order to do this the following needs to be addressed:

- There needs to be some notion of an overall list of pages, preferably queryable by date
and time of last change;
- We'd need access, per page, to authorization information
- Metadata information (author and pub date) also needs to be separated out in some way

The plugin that Tobias mentioned seems to do the last item fine, but not the first two.  Do
you have a solution for those?

Thanks,
Karl

On Fri, Sep 16, 2011 at 9:40 AM, Sumana Harihareswara <sumanah@wikimedia.org> wrote:
> Hi.  I happened to see you both discussing MediaWiki and 
> search/indexing in a mailing list recently.
>
> You might be interested in asking your question to the 
> MediaWiki/Wikimedia developers' list
>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> and I'd also welcome any assistance in improving our Lucene search 
> extension, which is used on Wikipedia:
>
> http://www.mediawiki.org/wiki/Extension:Lucene-search
>
> Thanks!
>
> --
> Sumana Harihareswara
> Volunteer Development Coordinator
> Wikimedia Foundation
>

Mime
View raw message