Return-Path: X-Original-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E6BC7980B for ; Thu, 1 Mar 2012 19:52:56 +0000 (UTC) Received: (qmail 32599 invoked by uid 500); 1 Mar 2012 19:52:56 -0000 Delivered-To: apmail-incubator-stanbol-dev-archive@incubator.apache.org Received: (qmail 32555 invoked by uid 500); 1 Mar 2012 19:52:56 -0000 Mailing-List: contact stanbol-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: stanbol-dev@incubator.apache.org Delivered-To: mailing list stanbol-dev@incubator.apache.org Received: (qmail 32545 invoked by uid 99); 1 Mar 2012 19:52:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Mar 2012 19:52:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [83.97.42.2] (HELO lilly.ping.de) (83.97.42.2) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 01 Mar 2012 19:52:52 +0000 Received: (qmail 4026 invoked from network); 1 Mar 2012 19:52:29 -0000 Received: (ofmipd 85.22.29.39); 1 Mar 2012 19:52:07 -0000 Date: 1 Mar 2012 20:52:28 +0100 Message-ID: <4F4FD37C.10707@ping.de> From: "Andreas Kuckartz" To: stanbol-dev@incubator.apache.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20120216 Icedove/8.0 MIME-Version: 1.0 Subject: Re: CV Mining (Early adopter program) References: <4F4F8B4B.3090303@celi-france.com> In-Reply-To: <4F4F8B4B.3090303@celi-france.com> X-Enigmail-Version: 1.3.4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Luca, which CMS do you intend to use for the project? Cheers, Andreas --- On 01.03.2012 15:44, Luca Dini wrote: > Dear All, > Please let me introduce a new early adopter project, in which we will > be involved. I hope in a great and intellectually inspiring > communication with you all. > Kind regards, > Luca > > The project (run by CELI under the umbrella of the IKS early adopter > program) aims to integrate Stanbol technology with a specific context > of use, i.e. CV management via CMS and semantic technologies. The > crucial challenge of this integration is the parametrization of > Stanbol to deal with information which has been automatically > extracted from CV. Besides the direct integration results, which will > be distributed at the same conditions as Stanbol software, the early > adoption project will produce two additional by-products: > > The provision to Stanbol of classes allowing the connection with > Linguagrid (www.linguagrid.org) and possibly LanguageGrid > (http://langrid.org/en/index.html). > The verification of the extensibility of Stanbol to languages > other than English (The project will concern CVs written in French). > > We envisage two prototypical use cases, which are described in the > following: > Use-Case 1: Human Resources Department > > The context is the one of a Human Resource Department of a big company > or any recruitment company. The basic goal is to provide them with an > open source document management system able to deal in an intelligent > way with non structured CV (or "resumes"), i.e. CVs which comes in > Microsoft Word, pdf, Open Office etc. Each time a new CV arrives it is > inserted in the document base. Behind the scene this is not just > adding a document but passing it to a Standbol server which enhances > it with structured information. > > This might represent: > > experiences of the candidate > skills of the candidate > Education level > reference data (name, address etc.) > contact data > > Some of these data might be slightly more structured than just named > entities, but definitely in the representation power of rdf. Some of > them could be even more semantically enriched, by providing external > information on companies, places, specific technologies etc. > > As a result of this personnel at the HR department would be able to > formulate queries such as (just an exemplification): > > All CV of people living in Paris older then 27 years > All CV of people with skills in SQL server and Java > All people who have worked in an high tech company since november > 2011. > > .... > > In terms of GUI the user will be confronted with a system that allows > easy search and easy population of CV data. > > > Use-Case 2: Employment Administration > > In the second use case we are keeping into account the needs of public > agencies with the institutional role of re-integrating in the labor > market persons which loose their job or that are looking for their > first job. In particular we are considering institutions such as the > French P�le emploi (http://www.pole-emploi.fr/accueil/ , > http://fr.wikipedia.org/wiki/P%C3%B4le_emploi). This institution is in > charge of crossing the demand and the offer on the labor market, in > particular by addressing candidates to the right potential employer, > suggesting possible educational training, by shaping their skills, > etc. In many cases these agencies are managed at a local rather than a > national level, as the market of labor is affected by regional > constraints. In this use case the parametrized CMS has a double goal: > > Much like in the previous case to allow the fast and intelligent > retrieval of CVs out of the document base in order to answer potential > employer needs. > To be able to perform Business Intelligence like tasks over the > structured information provided by the mass of analyzed CVs. Of course > performing BI analysis is out of the scope of this proposal, but the > structuring of CV information into ontology based classes is > definitely the first step towards this direction. > > > > > Challenges > > From a technical point of view the most interesting challenge consists > in integrating the set of Stanbol enhancer, with the semantic web > services provided at www.linguagrid.org. In principle it should not be > a different integration than what has already been made with > OpenCalais WS and Zemanta WS. However there are at least two major > challenges: > > Multilinguality. The extraction will consider French documents > rather than English ones. Moreover, in a second phase (not covered by > the present project, the whole system could be extended to Italian and > French. > Ontological extension. While CVs typically contains quite a lot of > named entities which are already covered by Stanbol (e.g. geographical > names, time expressions, Company names, person names) there are > entities which will need some ontology extension such as skills and > education. > Structural Complexity. In a CV instances of entities are linked > each other in a structurally complex way. For instance places are not > just a flat list of geographical entities, but their are likely to be > connected with periods, with job types, with companies, etc. Handling > this structural complexity represents an important challenge. > > >