lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gtkesh <>
Subject Help with document design for indexing/searching
Date Wed, 03 Jul 2013 16:52:11 GMT
Hi everyone! This is my first post here and I'm new to Lucene, so I would
appreciate your ideas with the design of lucene document I came up with.

*What is my goal*

I'm trying to index the collection of xml documents and all have the same
structure like this:
Each <section> tag can itself have <sections> tag which itself has <section>
tags and so on. The maximum depth is 3. 

So, I figured out to have these separate fields:

"pageTitle" - doc/title
"sectionTitle" - doc/sections/section/title
"sectionText" - doc/sections/section/text
"subSectionTitle" - doc/sections/section/sections/section/title
"subSectionText" - doc/sections/section/sections/section/text
"subSubSectionTitle" - ...
"subSubSectionText" - ...

Currently, as I index, each document is a separate sectiontext, sectiontitle
or sub things, but they all have the same pageTitle field of course. For
searching, is that the good approach to index the document? I will describe
below *how I'm going to search*;

The real page/document structure is like this: pageTitle is the disease name
and e.g sectionTitle can be "Definition" or "Treatment" or something like
that. So, when the user asks a question like: "What are the treatments for
"x" disease?"  - I'm classifying that the questions is "treatment" type, so
I would like to search the disease name in lucene index, but I would like to
specifically retrieve the section of which title is "treatment". 

Is that the good indexing approach? And also, how would you recommend me to
construct a query for searching, because I want to give disease name more
importance and type ("treatment") relatively less.

Thanks in advance!

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message