lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Murnane <will.murn...@gmail.com>
Subject Split single string into several fields?
Date Tue, 27 Oct 2009 22:50:44 GMT
Hello list,
  I have some semi-structured text that has some markup elements, and
I want to put those elements into a separate field so I can search by
them.  For example (using HTML syntax):
---- 8< ---- document
<h1>Section title</h1>
Body content
---- >8 ----
I can find that the things inside <h1>s are "Section" and "title", and
"Body" and "content" are outside.  I want to create two fields for
this document:
insideh1 -> "Section", "title"
alltext -> "Section", "title", "Body", "content"

What's the best way to approach this?  My initial thought is to make
some kind of MultiAnalyzer that consumes the text and produces several
token streams, which are added to the document one at a time.  Is that
a reasonable strategy?

Thanks!
Will

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message