any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <>
Subject Extraction of structure from non-XML based formats
Date Mon, 01 Jul 2013 02:14:21 GMT
Hi All,
A while ago I lodged ANY23-134 [0] with the intention of extending the
Any23 paradigm to other document formats other than subsets of XML.
Say for example, I would like to read in PDF documents such as this one [1]
or this one [2].
The idea would be to use Any23 (within a pipeline) to extract out the
specification data as triples. I can then build a triples representation of
this document for really domain specific inferences.
Is ANY23-134 the correct way to go about this? Should I be looking at some
other existing tool we have within Any23... XPath immediately springs to
mind but I am not sure and would really appreciate a comment or two from
anyone out there!

Thank you very much.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message