lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Re: Possible to invoke same Lucene query on a String?
Date Fri, 21 Aug 2009 02:44:50 GMT

---- Paul Cowan <cowan@aconex.com> wrote: 
> ohaya@cox.net wrote:
> > Document1                 subdoc1 term1 term2
> >                                       subdoc2 term1a term2a
> >                                       subdoc3 term1b term2b
> >
> > However, I've now been asked to implement the ability to query the sub-documents.

> >
> > In other words, rather than the web app displaying what I showed above, they want
it to return something like just:
> >
> > Document1                 subdoc2 term1a term2a
> 
> Just checking here... you only want to match where the terms are in 
> specific sub-documents? That is, if someone searches for 'term1a AND 
> term2b', what do you want to see? Nothing (because no sub-document 
> matches both terms)? Or subdoc2 and subdoc3, because they're both part 
> of the reason that Document1 matched?
> 
> If the former, then just indexing each sub-doc as a separate document 
> (duplicating the document-level information) may be the simplest option.
> 
> Cheers,
> 
> Paul
>


Hi Paul,

Hah!

Yes, it's the former I think...

The "Hah!" was because I was googling, and just ran across this:

http://javatechniques.com/blog/lucene-in-memory-text-search-example/

which, I think, creates an in-memory index, then searches it.

I was reading through that, as I saw your message.

As I was reading though, I am wondering:  This seems like it would create an awful lot of
overhead?

In other words:

- I'd have to create a (very small) index, for each sub-document, where I do the Document.add()
with just the (for example) two terms, then
- Run a query against the 1-entry index, which
- Would either give me a "yes" or "no" (for that sub-document)

As I said, I'm concerned about overhead.  Some of the documents are quite large, containing
>20K sub-documents.  That means that, for such a document, I'd have to create >20K indexes.

Is there really no other way to do this?  I guess that, in my mind, I keep thinking about
somehow "redirecting" Lucene to do a search on a single String object (that was just a kind
of metaphor)?

Comments?

Thanks for your response!

Jim



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message