lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Barrett" <>
Subject Searching a hierarchical data set
Date Wed, 06 Feb 2002 21:12:59 GMT

I have an implementation type question for the group that is causing me some
headaches. The records I need indexed are of the following form: there are a
couple header fields, and then an other field with an arbitrary number of
sub fields. Something like a notebook...
		Name: Tom Barrett
		Date: 2/6/2002
			Note1: This is the first note
			Note2: This is my second note
			NoteN: This is my last note

Now I want to do searches on these documents where you match things in the
header fields, and match in the notes field, also returning which note field
you match in. Something like...
		name:"Tom" notes:"second note"
i want to return the document above, and also let me know that the match
occured in the Note2 field.

Originally, I thought I would just concatinate all of the NoteN fields into
one field Notes, so that a document would be <Name, Date, Notes>. But this
makes it impossible to know which note field a match occurs on, and also if
I do a search for
		name:"Tom" notes:"note"
I want a match for each note, since all of the NoteN fields would match this

The only way I can think to do this is to have document consist of <Name,
Date, Note> where each note is one of the NoteN fields. But the problem I
see is that there are about 2 million of these Note fields so the index
would be huge if I duplicate the Name and Date field for every document
(normalization problem). And I want a search for
to only return the above document (not the above document N times).

I think the right answer is to have two indexes and normalize that
with just a document just being a <name, date, docID> and a second index
being a <docID, Note> with each note field being it's own document. Then, a
search for
		name:"Tom" notes:"second note"
would search the <name, date, docID> index to get the hits on "Tom", and
then search the <docID, Note> index to get hits on "second note" and then do
a docID matching type thing.

Anyone have any other thoughts on how to do this? I'm kind of stuck...

Thanks for the help,


Do You Yahoo!?
Get your free address at

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message