incubator-connectors-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wunderlich, Tobias" <tobias.wunderl...@igd-r.fraunhofer.de>
Subject AW: Indexing Wikipedia/MediaWiki
Date Mon, 19 Sep 2011 10:07:02 GMT
 (1) How do you form a URL that would take a user to a document?  Does it use the title, or
does it use the page ID?
I guess one way would be to just add the title to the main-url, like http://en.wikipedia.org/wiki/<title>.
However, I did not find out how to create a url to the document via pageid yet.


 (2) If the URL includes the page ID, is there any way to get metadata information about the
document using the page ID directly?  It probably wouldn't be the query feature that would
do this, btw.

It is possible to get the metadata of a document using the pages id (instead of title) directly:
Titel -> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API&rvprop=timestamp|user|comment|content
PageID -> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&pageids=27697087&rvprop=timestamp|user|comment|content


Tobias


-----Ursprüngliche Nachricht-----
Von: Karl Wright [mailto:daddywri@gmail.com] 
Gesendet: Montag, 19. September 2011 11:35
An: connectors-user@incubator.apache.org
Betreff: Re: Indexing Wikipedia/MediaWiki

The API seems to be built around using Titles as document keys, and yet there is a page ID
also, which would probably be better at looking up page data.  So I have some new questions:

(1) How do you form a URL that would take a user to a document?  Does it use the title, or
does it use the page ID?
(2) If the URL includes the page ID, is there any way to get metadata information about the
document using the page ID directly?  It probably wouldn't be the query feature that would
do this, btw.

Thanks,
Karl


On Mon, Sep 19, 2011 at 5:09 AM, Wunderlich, Tobias <tobias.wunderlich@igd-r.fraunhofer.de>
wrote:
> Hey Karl,
>
> I did some research and the WikiMedia-API looks promising:
>
> - There needs to be some notion of an overall list of pages:
>        - http://www.mediawiki.org/wiki/API:Allpages
>        - Example: 
> http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Kr
> e&aplimit=5
>
> - Metadata information (author and pub date) also needs to be separated out in some way:
>        - 
> http://www.mediawiki.org/wiki/API:Properties#Revisions:_Example
>        - Example:  
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=A
> PI|Main%20Page&rvprop=timestamp|user|comment|content
>
> What do you think?
>
> Tobias
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Karl Wright [mailto:daddywri@gmail.com]
> Gesendet: Freitag, 16. September 2011 16:11
> An: Sumana Harihareswara
> Cc: Wunderlich, Tobias
> Betreff: Re: MediaWiki & Lucene development
>
> The lucene-search extension may or may not be appropriate for Tobias.
> But my interest would extend towards getting wiki content into whatever target a ManifoldCF
sets up, not just Solr/Lucene.  In order to do this the following needs to be addressed:
>
> - There needs to be some notion of an overall list of pages, 
> preferably queryable by date and time of last change;
> - We'd need access, per page, to authorization information
> - Metadata information (author and pub date) also needs to be 
> separated out in some way
>
> The plugin that Tobias mentioned seems to do the last item fine, but not the first two.
 Do you have a solution for those?
>
> Thanks,
> Karl
>
> On Fri, Sep 16, 2011 at 9:40 AM, Sumana Harihareswara <sumanah@wikimedia.org> wrote:
>> Hi.  I happened to see you both discussing MediaWiki and 
>> search/indexing in a mailing list recently.
>>
>> You might be interested in asking your question to the 
>> MediaWiki/Wikimedia developers' list
>>
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> and I'd also welcome any assistance in improving our Lucene search 
>> extension, which is used on Wikipedia:
>>
>> http://www.mediawiki.org/wiki/Extension:Lucene-search
>>
>> Thanks!
>>
>> --
>> Sumana Harihareswara
>> Volunteer Development Coordinator
>> Wikimedia Foundation
>>
>

Mime
View raw message