lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adasal <>
Subject Re: Ontologies in Lucene???
Date Fri, 02 Jun 2006 13:54:16 GMT
do you subscribe to the SemWeb and ONTAC mailing lists? Reading these lists,
and from other sources, it is apparent that there is a lot of interest in
building out ontologies and ontological usages from the small and
immiediate. There is much discussion of folksonomies and their impact. What
I see is that there is a slowly evolving set of tools and skills in how to
solve small scale problems that range from using RDF and OWL efficiently to
how to include an RDF graph in code in the most transparent way for the
developer (c.f. Henry Storey circ. 5th April and 30th May respectively for
There are issues such as mapping between conjoint Ontologies - how to do
automatically and many others.
I actually see Lucene more as a tool that would be utilised in solving these
issues although it can work the other way round where faceted searches are
facilitated by relevant ontologies.
w.r.t. the former I think the approach is to view Lucene as acting on nodes
where there is data uncertainty.
But there is an interesting feed back mechanism that could be established
here where RDF would act on code where there is type uncertainty, this,
anyway would be the ideal virtuous circle.
Let me give an example.
I am working on a large project where there is the need to gather local data
from various sources, identified as within the same business domain and
sphere of operation. At present this data is coded and stored with a
shifting, centrally administered, set of codes that have been allowed to be
interpreted differently locally. The solution is to provide and single XML
and a mapping tool to map local data through local XML versions into the
single version.
Lucene is used to search through xpath nodes, using stemming and so forth,
to try to find a match between local and central term descriptions.
This is something that could have been implemented (costs and skills aside)
by creating a local and central ontology. Both of these could be evolving
ontologies with the user able to map between the two and idtenify where
cetral codes need to be modified or extended. i.e. where there would be loss
of meaning and coresponding data loss **either in storage or retrieval** not
to do so.
RDF/OWL has concepts such as Class/SubClass, isDefinedBy, isSameAs,
Conjoint/Disjoint, Necessary and Sufficient Condition and so on. These can
usually be determined by thought and inspection and this is very much the
anti-dote to the ad-hoc categorization that goes on when people just have to
place their data somewhere, but cannot find a ready category. In short such
interfaces would make point of entry data capture a far more rational
process, part of the historical problem we are trying to overcome.

In short we should be trying to undo the situation that existed where a
series of codes where handed out with some guide lines on their usage but no
means to enforce this. The central base of codes has changed and the local
usage has drifted in different ways from the original guidance.

But there is an interesting thing in this in that the local usage represents
local knowledge, it might be dangerous to lose it by straight jaketing this
information into new, redefined, codes.
So there is particular value in contemplating building an ontology in the
way I have outlined.
Now comes the most interesting bit.
As the tool is interactive the code should be non deterministic. So a way to
use Lucene would be to help to narrow down cadidate terms, not, as before,
expressed in XML, but expressed in OWL as classes and instances.
There are a number of things going on in a tool like this. In terms of XML
element recognition the issue is to ensure the actual XPATHs are mapped
through. So there are two seperate issues, that of whther there is an
appropriate central equivalent, and that of actually mapping how the data is
expressed locally into the central repository once the equivalent has been
identified. Nevertheless the idea would be to have code that reads in RDF
and being given a local equivalent would then be able to map it through. I
think in this regard, code whose type behaviour depends on the parsed RDF
that is generated in the user interface with the help of Lucene, would be
very useful. It wouldn't eliminate all of the other issues (those i haven't
gone into), but it would greatly simplify the code base.
So this is an example of how I envisage Lucene will hit RDF/OWL.
What are others thoughts?

On 01/06/06, gobe hobona <> wrote:
> Are there any plans to incorporate new advances in ontology-based
> information retrieval into Lucene?
> For example, Hobona et al. (2006) present a 3D portal that uses the
> WordNet linguistic ontology to search Z39.50 servers. Another successfull
> implementation of ontology-supported search is presented by Bucher et al.
> (2006).
> Perhaps some of the ontology-based reasoning in WordNet, Cyc or OpenCyc
> could be leveraged in Lucene.
> Reference:
> Hobona, G., James, P. and Fairbairn, D. (2006) Multi-dimensional
> Visualisation of Degrees of Relevance of Geographic Data International
> Journal of Geographic Information Science, 20, 5 469 - 490.
> Bucher, B., Clough, P., Joho, H., Purves, R. and Syed, A. K. (2005)
> Geographic IR Systems: Requirements and Evaluation: Proceedings of the 22nd
> International Cartographic Conference. A Coruna, Spain, ICA.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message