incubator-connectors-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Indexing Wikipedia/MediaWiki
Date Mon, 19 Sep 2011 09:35:24 GMT
The API seems to be built around using Titles as document keys, and
yet there is a page ID also, which would probably be better at looking
up page data.  So I have some new questions:

(1) How do you form a URL that would take a user to a document?  Does
it use the title, or does it use the page ID?
(2) If the URL includes the page ID, is there any way to get metadata
information about the document using the page ID directly?  It
probably wouldn't be the query feature that would do this, btw.

Thanks,
Karl


On Mon, Sep 19, 2011 at 5:09 AM, Wunderlich, Tobias
<tobias.wunderlich@igd-r.fraunhofer.de> wrote:
> Hey Karl,
>
> I did some research and the WikiMedia-API looks promising:
>
> - There needs to be some notion of an overall list of pages:
>        - http://www.mediawiki.org/wiki/API:Allpages
>        - Example: http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Kre&aplimit=5
>
> - Metadata information (author and pub date) also needs to be separated out in some way:
>        - http://www.mediawiki.org/wiki/API:Properties#Revisions:_Example
>        - Example:  http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main%20Page&rvprop=timestamp|user|comment|content
>
> What do you think?
>
> Tobias
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Karl Wright [mailto:daddywri@gmail.com]
> Gesendet: Freitag, 16. September 2011 16:11
> An: Sumana Harihareswara
> Cc: Wunderlich, Tobias
> Betreff: Re: MediaWiki & Lucene development
>
> The lucene-search extension may or may not be appropriate for Tobias.
> But my interest would extend towards getting wiki content into whatever target a ManifoldCF
sets up, not just Solr/Lucene.  In order to do this the following needs to be addressed:
>
> - There needs to be some notion of an overall list of pages, preferably queryable by
date and time of last change;
> - We'd need access, per page, to authorization information
> - Metadata information (author and pub date) also needs to be separated out in some way
>
> The plugin that Tobias mentioned seems to do the last item fine, but not the first two.
 Do you have a solution for those?
>
> Thanks,
> Karl
>
> On Fri, Sep 16, 2011 at 9:40 AM, Sumana Harihareswara <sumanah@wikimedia.org> wrote:
>> Hi.  I happened to see you both discussing MediaWiki and
>> search/indexing in a mailing list recently.
>>
>> You might be interested in asking your question to the
>> MediaWiki/Wikimedia developers' list
>>
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> and I'd also welcome any assistance in improving our Lucene search
>> extension, which is used on Wikipedia:
>>
>> http://www.mediawiki.org/wiki/Extension:Lucene-search
>>
>> Thanks!
>>
>> --
>> Sumana Harihareswara
>> Volunteer Development Coordinator
>> Wikimedia Foundation
>>
>

Mime
View raw message