any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Extraction of structure from non-XML based formats
Date Mon, 01 Jul 2013 02:14:21 GMT
Hi All,
A while ago I lodged ANY23-134 [0] with the intention of extending the
Any23 paradigm to other document formats other than subsets of XML.
Say for example, I would like to read in PDF documents such as this one [1]
or this one [2].
The idea would be to use Any23 (within a pipeline) to extract out the
specification data as triples. I can then build a triples representation of
this document for really domain specific inferences.
Is ANY23-134 the correct way to go about this? Should I be looking at some
other existing tool we have within Any23... XPath immediately springs to
mind but I am not sure and would really appreciate a comment or two from
anyone out there!

Thank you very much.
Lewis

[0] https://issues.apache.org/jira/browse/ANY23-134
[1]
http://www.fanucrobotics.com/cmsmedia/datasheets/ARC%20Mate%20100iC%20Series_7.pdf
[2]
http://www.fanucrobotics.com/cmsmedia/datasheets/ARC%20Mate%200iA_170.pdf

-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message