lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: Hierarchical document
Date Tue, 21 Oct 2003 00:24:29 GMT
On Monday 20 October 2003 10:31, Erik Hatcher wrote:
> On Monday, October 20, 2003, at 11:06  AM, Tom Howe wrote:
> There is not a more "lucene" way to do this - its really up to you to
> be creative with this.  I'm sure there are folks that have implemented
> something along these lines on top of Lucene.  In fact, I have a
> particular interest in doing so at some point myself.  This is very
> similar to the object-relational issues surrounding relational
> databases - turning a pretty flat structure into an object graph.
> There are several ideas that could be explored by playing tricks with
> fields, such as giving them a hierarchical naming structure and
> querying at the level you like (think Field.Keyword and PrefixQuery,
> for example), and using a field to indicate type and narrowing queries
> to documents of the desired type.
>
> I'm interested to see what others have done in this area, or what ideas
> emerge about how to accomplish this.

I'm planning to do something similar. In my case problem is bit simpler; 
documents have associated products, and products form a hierarchy.
Searches should be able to match not only direct matches (searching
product, article associated with product), but also indirect ones via
membership (product member of a product group, matching group).
Product hierarchy also has variable depth.

To do searches using non-leaf hierarchy items (groups), all actual product
items/groups associated with docs are expanded to full ids when
indexing (ie. they contain path from root, up to and including node,
each node component having its own unique id).
Thus, when searching for an intermediate node (product grouping), 
match occurs since that node id is part of path to products that are in
the group (either directly or as members of sub-groups).

Since no such path is stored (directly) in database, this also allows me to do 
queries that would be impossible to do in database (I could add similar 
path/full id fields for search purposes of course). Thus, Lucene index is 
optimized for searching purposes, and database structure for editing
and retrieval of data.

Another thing to keep in mind is that at least for metadata it may make sense 
to use specialized analyzer, one that allows tokenizing using specific ids
to store ids as separate tokens; instead of using some standard plain text
analyzer. This way it is possible to separate ids from textual words (by
using prefixes, for example, "@1253" or "#13945"); this allows for accurate
matching based on identity of associated metadata selections.
 
-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message