lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul.Illingwo...@saaconsultants.com
Subject Re: Hierarchical Documents
Date Tue, 23 Aug 2005 10:36:04 GMT





I have been struggling with this sort of problem for some time and still
haven't got an ideal solution.

Initially I was going to go for the approach Erik has suggested for similar
reasons - it allowed me to search within categories and within sub
categories of those categories very simply. Unfortunately in my system the
categorisation is dynamic. If a user wants to rename a category then all
the documents within those categories or any of the subcategories would
have to be reindexed. Similarly, if the user moves a category from one
place to another then again this can result in lots of reindexing.

The reindexing problem can be avoided by using identifiers for the
categories. I simply index a list of the ids for the categories the
document belongs to when indexing the document. This allows me to
dynamically change and restructure the category hierarchy without impacting
the index. The downside is that for searching I need to map the category
path to the category id. This I do using a database look-up prior to
searching Lucene. To carry out sub category searching I will be looking up
all the sub categories in the database using a wildcard and then probably
building either Boolean expression (for small numbers of subcategories) or
a Lucene filter combined with the ConstantScoreQuery (from the sandbox) for
large number of subcategories - the user having to understand that
searching lots of subcategories may take some time. In my case the
subcategory searching is not the norm so this isn't too much of a problem.

There's some useful articles here
http://www.intelligententerprise.com/001020/celko1_1.jhtml?_requestid=36367
and here http://www.dbazine.com/oracle/or-articles/tropashko4 that discuss
tree structures in SQL that I found useful.

Regards

PaulI.

Erik Hatcher <erik@ehatchersolutions.com> wrote on 23/08/2005 10:34:16:

>
> On Aug 22, 2005, at 2:27 AM, Rohit Lodha wrote:
> > Currently, Documents cannot contain other documents. I have a Graph of
> > Objects (Documents) to search in.
> >
> > I could flatten them and search but...
> >
> > Is there any nice way to do it?
>
> I have used a technique of encoding a hierarchical path (like "/
> category/subcategory/sub-subcategory") into an indexed untokenized
> field.  This allows for flat or hierarchical queries.  A hierarchical
> PrefixQuery of "/category" returns all documents from "/category"
> downward, for example.
>
> I suspect others have done more interesting techniques for this sort
> of thing - I'd love to hear about them.
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message