lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wu, Stephen T., Ph.D." <Wu.Step...@mayo.edu>
Subject Semi-structured queries
Date Fri, 07 Dec 2012 21:47:09 GMT
I’ve been trying to do semi-structured queries & query parsing.  In other words, you
could have XML snippets mixed in with plain terms, e.g. a query like:

      christmas tree <store  loc=”abc” close_hour=”2200”>

where you’re looking for a document with the terms “christmas” “tree” but also some
structured data about where (practically) you could buy the tree.   Additionally, I’d like
to be able to write functions relating multiple items, sort of like predicate logic or database-like
queries:

      christmas tree NEARBY( <store  close_hour=”2200”>, <restaurant close_hour=”2400”>
)

which would only find you places to buy a christmas tree that had stores and restaurants in
close proximity to each other.  Finally, we would eventually be interested in doing something
similar to org.apache.lucene.queries.CustomScoreQuery, where you can put in several different
criteria and weight them separately per document.

I’ve been poking around at a lot of places and would appreciate some help about where I
should extend, an existing walkthough or example, etc.  Here’s what I’ve been considering:

  *   org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.java — modifying
this to add another group-like QueryNode, modifying the processor pipeline to include this,
modifying the definition of a TERM so it can deal with attribute=”value” pairs in pseudo-xml.
 I read through the QueryParser documentation but quickly got lost in the implementation.
  *   org/apache/lucene/queryparser/xml/CorePlusExtensionsParser.java — this seems like
it has to do a lot of what I want, but I can’t tell.  I hadn’t originally thought of the
query coming in as an xml stream.  I think I would still need to define some new Query types...
Perhaps a lot?  One for each type of thing (“store”, in the above) I’d search for?

Thanks!

stephen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message