lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walid \"jo\" Gedeon" <wged...@gmail.com>
Subject RFO- Indexing 'meaningfull' xml
Date Sat, 02 Aug 2008 11:37:09 GMT
Hello!

This is a Request for Opinion targeted for the Lucene experts out there :-)

I'm trying to get to know Lucene a bit better: After having played with the
'getting started', I moved onto trying indexing of xml files.

The simple (?) project would be to index chat sessions, each session stored
in a file and containing many entries of the form:

<message type="incoming_privateMessage" timestamp="200808021312"
to="someone%40domain1%2Ecom"
from="someoneelse%40domain2%2Ecom"><body>Hello</body></message>

(it's jabber-client protocol with timestamp)

In addition to the full text search, I'd like to be able to perform searches
such as:
 - list sessions from:xxx timestamp:200808*
 - list sessions (from:xxx OR from:yyy)
 - etc

Would it be better to store each message as a separate document with its
fields, adding the 'filename' (session identifier) as an extra field? or
maybe is there a better way of doing it making the session file a document?

All comments appreciated, thanks! :-)

PS: Of course, the actual goal isn't to index chat history (there are many
chat searches available) but use this to learn the API ;-)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message