lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walid \"jo\" Gedeon" <wged...@gmail.com>
Subject Re: RFO- Indexing 'meaningfull' xml
Date Thu, 07 Aug 2008 13:39:02 GMT
Hello Hoss,
  Thanks for your reply :-)
I believe I'm in the first case: "to be able to search for 'foo' and get
back a list of all sessions where the word 'foo' was used". However, I want
to be able to separate free text search from field-based search.

I have put both the session and messages as documents, the session document
for free text search and the messages for field based search:
The algorithm that I've ended up using since I posted the initial message
is:
  o execute the search on messages and documents, then on all hits
  o construct a list of 'filename's that match and show the last 10 results
by newest.

This works, but I'm afraid is not going to be performant when I end up
indexing all sessions. There must be a way to get the right hit-set from a
search.

But in all cases, I'm looking at Solr for potential answers, thanks for
mentioning it :-)

Ta.
Jo

On Thu, Aug 7, 2008 at 12:59 AM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : In addition to the full text search, I'd like to be able to perform
> searches
> : such as:
> :  - list sessions from:xxx timestamp:200808*
> :  - list sessions (from:xxx OR from:yyy)
> :  - etc
> :
> : Would it be better to store each message as a separate document with its
> : fields, adding the 'filename' (session identifier) as an extra field? or
> : maybe is there a better way of doing it making the session file a
> document?
>
> As a general rule of thumb, you make 1 document for each result you want
> to get back when you execute a search ... if you want to be able to search
> for "foo" and get back a list of all sessions where the word "foo" was
> used, then each session should be a document.  If you also want to be able
> to search for "foo" and get back a list of each message thta contained the
> word "foo", then each message can also be a document -- either in another
> index, or even in the same index (here's no rule that says all documents
> must have the same fields)
>
> BTW: If you are planning on experimenting with the Java API, i would
> suggest sending any specific followup questions to the java-user@lucene
> list.  But you may also want to consider checking out Solr, and the
> solr-user list.  Depends on what level of abstraction you want to deal
> with (Solr provides a config based web service type front end for dealing
> with Lucene indexes, but also has a Java API both for indexing and for
> hoooking in custom functionality when executing searches)
>
>
> -Hoss
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message