lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Murzaku" <>
Subject RE: Very large queries?
Date Fri, 28 Mar 2003 12:34:03 GMT
how about this:
assuming that your taxonomies are tree-like structures, you could expand
every term in the documents to be indexed with the path where they
belong in the tree (i.e. all hypernyms and hyponyms) - for this you use
the same technique as when using thesauri. This will allow you to enter
in the query only one node of the taxonomy - from any level - and get
back all the records/documents that contain it...

Alex Murzaku

-----Original Message-----
From: [] 
Sent: Friday, March 28, 2003 6:49 AM
To: Lucene Users List
Subject: Re: Very large queries?

Thanks for these suggestions.  The ideas of adding taxonomy-related
terms to the documents is an interesting one and bears some thought.
However, if I have to pre-process the corpus to determine which terms to
add, and then to add them, it would seem that I've already accomplished
my primary goal and don't need an indexer and search engine.  Remember:
this is not really an information retrieval application (with
document-level granularity) that is being contemplated here, but an
information extraction and text/data
mining application (with "fact-level" granularity).   My hope was to
leverage a search engine, guided by taxonomies, to accomplish this at
least as a first cut.

I do find Morus's suggestion to do an "inverse expansion" of terms in
the index at indexing time to be very intriguing as well.  Perhaps it is
also what was meant by Ype's suggestion about adding stuff to the
document (meaning adding stuff to the index).

It appears I will also need to handle my own identification of matched
terms.  Verity, too, supports term highlighting -- but I am not at all
certain they return information concerning the exact string that
triggered the highlighted match.  Perhaps if the "inverse expansion"
approach can be made to work, it would eliminate this need.  And it
might also eliminate the need for the very large queries.  The details
are unclear at this point, but the possibilities are interesting.

The suggestion of Jython is also appreciated and I was considering it
already.  I have not used Jython yet, but have developed all of my
ontology/taxonomy/dictionary/thesaurus translation tools in Python (and
yes, I do know the differences among all of these).  I've even started
to develop some of my interface stuff in Tkinter, but if I'm going to go
the Java route I'll probably abandon that in favor of Swing.

Well, I can see that I have a bit of work to do.  I do have an
undergraduate and a graduate student here at NC State working with me,
and perhaps I can squeeze some of this work out of them :-).

Gary H. Merrill
Director and Principal Scientist, New Applications
Data Exploration Sciences
GlaxoSmithKline Inc.
(919) 483-8456

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message