stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Grisel <>
Subject Re: Categorization
Date Wed, 16 Mar 2011 11:20:31 GMT
2011/3/14 Alex Lopez <>:
> Hi Stanbol devs,
> we are working on a semantic app. that will/should do content categorization
> and other stuff.
> So I have some code that takes detected entities as input (in form of
> dbpedia urls) and looks up in dbpedia both YAGO categories and wikipedia
> categories (the ones that use to be skos:subject and now are dct:subject).
> But of course this is tied to particular namespaces/service, I would like to
> expand it and abstract the particulars while retaining the functionality,
> something like a wrapper:
> input: resource/resources/text (raw content)
> output: categories/topics
> Keep in mind I'm not looking for categories of the sort
> Person/Organisation... more of the sort Science/Jazz Musicians etc
> So I've been following Stanbol's devs list for some time, and I'm exited
> about the possibilities, maybe I can use some of it for this particular
> requeriment. Right now, I see it as a collection of services, for example I
> can see zemanta doing what I want with DMOZ topics, included as an engine.
> (are there other engines doing similar?)
> But I wonder, is there a "central" place with methods/services doing this
> for all implementations? or perhaps I misunderstood what the project is
> about...
> kind of a getTopicsForResource(){
>    getDMOZ();
>    getWikipediaCategories();
>    getFreebaseCategories();
>    ...
> }
> If not, what are good place to look for this kind of functionality so I can
> include it in my method?
> Now I understand this in an incubating project so perhaps this is a planned
> feature, do you have any roadmap? Any expected date for a "first release" of
> stanbol?

Yes this is a planned feature. The existing
RelatedTopicEnhancementEngine is to be reimplemented to use the entity
hub index and to build predefined topic indexes out of the dbpedia
skos hierarchy and the fulltext of the related articles (to be able to
perform similarity queries using the MoreLikeThis feature of Solr).

We also need to extend the Stanbol vocabulary to handle topics that
are not entities.

In the mean time you can use OpenCalais / Zemanta / SalsaDev directly.

Olivier -

View raw message