Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 88213 invoked from network); 21 Oct 2003 15:34:56 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 21 Oct 2003 15:34:56 -0000 Received: (qmail 46789 invoked by uid 500); 21 Oct 2003 15:34:45 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 46759 invoked by uid 500); 21 Oct 2003 15:34:45 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 46746 invoked from network); 21 Oct 2003 15:34:45 -0000 Received: from unknown (HELO mxsf17.cluster1.charter.net) (209.225.28.217) by daedalus.apache.org with SMTP; 21 Oct 2003 15:34:45 -0000 Received: from peter (cpe-66-189-44-169.ma.charter.com [66.189.44.169]) by mxsf17.cluster1.charter.net (8.12.9/8.12.8) with SMTP id h9LFYYxF011453 for ; Tue, 21 Oct 2003 11:34:34 -0400 (EDT) (envelope-from peter.keegan@charter.net) Message-ID: <008e01c397e8$cf49e840$02a8a8c0@peter> From: "Peter Keegan" To: "Lucene Users List" References: <200310201824.29368.tatu@hypermall.net> Subject: Re: Hierarchical document Date: Tue, 21 Oct 2003 11:34:20 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N One way to implement hierarchical documents is through the use of predefined phrases. Consider the 2 hierarchies: 1. Kids_and_Teens/Computers/Software/Games 2. Computers/Software/Freeware When indexing a document belonging to (1), add these terms in consecutive order (autoincrement=1): "dir:Top dir:Kids_and_Teens dir:Computers dir:Software dir:Games dir:Bottom" For documents belonging to (2), add: "dir:Top dir:Computers dir:Software dir:Bottom" The terms "dir:Top" and "dir:Bottom" can be used to anchor a query to a specific portion of the hierachy. Thus, a query containing the phrase: "dir:Computers dir:Software" would match documents in both (1) and (2) (and perhaps others), but a query for: "dir:Top dir:Kids_and_Teens dir:Computers dir:Software" would target only 'Computer/Software' documents from the 'Kids_and_Teens' top level directory. (The QueryPhrase 'slop factor' would be set to 0). Peter ----- Original Message ----- From: "Tatu Saloranta" To: "Lucene Users List" Sent: Monday, October 20, 2003 8:24 PM Subject: Re: Hierarchical document > On Monday 20 October 2003 10:31, Erik Hatcher wrote: > > On Monday, October 20, 2003, at 11:06 AM, Tom Howe wrote: > > There is not a more "lucene" way to do this - its really up to you to > > be creative with this. I'm sure there are folks that have implemented > > something along these lines on top of Lucene. In fact, I have a > > particular interest in doing so at some point myself. This is very > > similar to the object-relational issues surrounding relational > > databases - turning a pretty flat structure into an object graph. > > There are several ideas that could be explored by playing tricks with > > fields, such as giving them a hierarchical naming structure and > > querying at the level you like (think Field.Keyword and PrefixQuery, > > for example), and using a field to indicate type and narrowing queries > > to documents of the desired type. > > > > I'm interested to see what others have done in this area, or what ideas > > emerge about how to accomplish this. > > I'm planning to do something similar. In my case problem is bit simpler; > documents have associated products, and products form a hierarchy. > Searches should be able to match not only direct matches (searching > product, article associated with product), but also indirect ones via > membership (product member of a product group, matching group). > Product hierarchy also has variable depth. > > To do searches using non-leaf hierarchy items (groups), all actual product > items/groups associated with docs are expanded to full ids when > indexing (ie. they contain path from root, up to and including node, > each node component having its own unique id). > Thus, when searching for an intermediate node (product grouping), > match occurs since that node id is part of path to products that are in > the group (either directly or as members of sub-groups). > > Since no such path is stored (directly) in database, this also allows me to do > queries that would be impossible to do in database (I could add similar > path/full id fields for search purposes of course). Thus, Lucene index is > optimized for searching purposes, and database structure for editing > and retrieval of data. > > Another thing to keep in mind is that at least for metadata it may make sense > to use specialized analyzer, one that allows tokenizing using specific ids > to store ids as separate tokens; instead of using some standard plain text > analyzer. This way it is possible to separate ids from textual words (by > using prefixes, for example, "@1253" or "#13945"); this allows for accurate > matching based on identity of associated metadata selections. > > -+ Tatu +- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org