stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Kuckartz" <>
Subject Re: CV Mining (Early adopter program)
Date Thu, 01 Mar 2012 19:52:28 GMT
Hi Luca,

which CMS do you intend to use for the project?


On 01.03.2012 15:44, Luca Dini wrote:
> Dear All,
> Please let me introduce a new early adopter project, in which we will
> be involved. I hope in a great and intellectually inspiring
> communication with you all.
> Kind regards,
> Luca
> The project (run by CELI under the umbrella of the  IKS early adopter
> program) aims  to integrate Stanbol technology with a specific context
> of use, i.e. CV management via CMS and semantic technologies. The
> crucial challenge of this integration is the parametrization of
> Stanbol to deal with information which has been automatically
> extracted from CV. Besides the direct integration results, which will
> be distributed at the same conditions as Stanbol software, the early
> adoption project will produce two additional by-products:
>     The provision to Stanbol of classes allowing the connection with
> Linguagrid ( and possibly LanguageGrid
> (
>     The verification of the extensibility of Stanbol to languages
> other than English (The project will concern CVs written in French).
> We envisage two prototypical use cases, which are described in the
> following:
> Use-Case 1: Human Resources Department
> The context is the one of a Human Resource Department of a big company
> or any recruitment company. The basic goal is to provide them with an
> open source document management system able to deal in an intelligent
> way with non structured CV (or "resumes"), i.e. CVs which comes in
> Microsoft Word, pdf, Open Office etc. Each time a new CV arrives it is
> inserted in the document base. Behind the scene this is not just
> adding a document but passing it to a Standbol server which enhances
> it with structured information.
> This might represent:
>     experiences of the candidate
>     skills of the candidate
>     Education level
>     reference data (name, address etc.)
>     contact data
> Some of these data might be slightly more structured than just named
> entities, but definitely in the representation power of rdf. Some of
> them could be even more semantically enriched, by providing external
> information on companies, places, specific technologies etc.
> As a result of this personnel at the HR department would be able to
> formulate queries such as (just an exemplification):
>     All CV of people living in Paris older then 27 years
>     All CV of people with skills in SQL server and Java
>     All people who have worked in an high tech company since november
> 2011.
> ....
> In terms of GUI the user will be confronted with a system that allows
> easy search and easy population of CV data.
> Use-Case 2: Employment Administration
> In the second use case we are keeping into account the needs of public
> agencies with the institutional role of re-integrating in the labor
> market persons which loose their job or that are looking for their
> first job. In particular we are considering institutions such as the
> French Pôle emploi ( ,
> This institution is in
> charge of crossing the demand and the offer on the labor market, in
> particular by addressing candidates to the right potential employer,
> suggesting possible educational training, by shaping their skills,
> etc. In many cases these agencies are managed at a local rather than a
> national level, as the market of labor is affected by regional
> constraints. In this use case the parametrized CMS has a double goal:
>     Much like in the previous case to allow the fast and intelligent
> retrieval of CVs out of the document base in order to answer potential
> employer needs.
>     To be able to perform Business Intelligence like tasks over the
> structured information provided by the mass of analyzed CVs. Of course
> performing BI analysis is out of the scope of this proposal, but the
> structuring of CV information into ontology based classes is
> definitely the first step towards this direction.
> Challenges
> From a technical point of view the most interesting challenge consists
> in integrating the set of Stanbol enhancer, with the semantic web
> services provided at In principle it should not be
> a different integration than what has already been made with
> OpenCalais WS and Zemanta WS. However there are at least two major
> challenges:
>     Multilinguality. The extraction will consider French documents
> rather than English ones. Moreover, in a second phase (not covered by
> the present project, the whole system could be extended to Italian and
> French.
>     Ontological extension. While CVs typically contains quite a lot of
> named entities which are already covered by Stanbol (e.g. geographical
> names, time expressions, Company names, person names) there are
> entities which will need some ontology extension such as skills and
> education.
>     Structural Complexity. In a CV instances of entities are linked
> each other in a structurally complex way. For instance places are not
> just a flat list of geographical entities, but their are likely to be
> connected with periods, with job types, with companies, etc. Handling
> this structural complexity represents an important challenge.

View raw message