stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhimanyu S <abhi.ma...@me.com.INVALID>
Subject Request for help with Stanbol implementation
Date Sat, 23 Feb 2019 10:30:25 GMT
Hello Stanbol Developers,

Background
I’m a data engineering manager with avid interest in NLP implementation and my partner,
Kit Blake (cc), is a serial entrepreneur who’s done extensive work in building and implementing
CMS systems (he is also more of a quasi-tech product manager). I’m based in Hong Kong and
he’s in Rotterdam.

I’ve been using Stanbol for the last few years. I’m also part of the developer mailing
list but haven’t contributed code as I’m not a developer. 

Overview
Recently we came across a challenge sponsored by the UN <https://uniteideas.spigit.com/unga-resolutions/Page/Home>
for extracting information from General Assembly Resolutions based on certain ontologies.

Objective
The objective of the challenge is to carry out automatic entity extraction and content analysis
to identify the following elements in UN General Assembly resolutions:

      Structures:

Title, proponent authority, identification numbers, date of approval;
Preamble (one or more paragraphs stating purpose, aims, and justification of a resolution);
Operative paragraphs (one or more paragraphs detailing the resolution);
Closing formula;
Annexes.

Entities: e.g. persons, roles, countries, places, deadlines, references to concepts relevant
to the “United Nations Bibliographic Information System” (UNBIS) or “Sustainable Development
Goals Interface Ontology” (SDGIO) of UN Environment.

Content analysis:
Preambular paragraphs: references, citations, mentions etc.
Operative paragraphs: identify who does invite/ask/require/demand what (actions, requests,
recommendations, etc.) and organize into machine-understandable data structures.

I think Stanbol would be the perfect tool for this purpose. The ‘Structure' and 'Content
Analysis' parts can be done by indexing their main UNDO Ontology <https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current>
and the ‘Entities' can be extracted by DBPedia as well as the other ontologies that they’ve
mentioned.

Development Needs
We’ve entered the challenge to submit a Stanbol based solution but are realising now that
we need help with the development of a solution, primarily for two tasks.

1. Adding their ontology (undo.owl from here <https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current>)
into Stanbol, to be used alongside DBPedia. I’ve managed to follow the instructions in these
two pages - https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec <https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec>
and https://stanbol.apache.org/docs/trunk/customvocabulary.html <https://stanbol.apache.org/docs/trunk/customvocabulary.html>
- and create an index but am unable to initialise it. Once I achieve this, I’ll also probably
try to add the other two ontologies.

2. Using the REST Interface to present all their documents to our instance of Stanbol, receiving
back the results, and displaying them. I’m guessing this might've been easier with CMS Adapter
and ContentHub but since those components are not part of the latest Stanbol version, I understand
that we need to use the REST interface.

Request
We’d love to hear from anyone who might be interested in contributing. As you can see, there
is no monetary benefit but we sure get bragging rights. And the GATE team is also submitting
an entry so it could be kind of a face-off between GATE and Stanbol - I’m not trying to
instigate any skirmishes - just hinting at friendly and healthy competition. :)

Alternatively, if someone can point me to a more lucid explanation for solving the two above
problems (especially the first one),  I’ll do the implementation on my own. Of course, I’ll
be forever grateful for this help and we'll mention the contribution in our submission.

The deadline for submissions is April 12th, so we’d highly appreciate responses sooner rather
than later. Also, please feel free to let me know if anything aforementioned is unclear.

Thank you,
-Abhi

PS: On a separate note, if any of you have suggestions on how quasi-tech folks like me can
contribute to the development, I’ll be more than happy to help. I’m very comfortable with
SQL, can code a bit in Python, and am fairly conversant with OO concepts.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message