lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Possible to invoke same Lucene query on a String?
Date Thu, 20 Aug 2009 21:35:47 GMT
Hi,

This question is going to be a little complicated to explain, but let me try.

I have implemented an indexer app based on the demo IndexFiles app, and a web app based on
the luceneweb web app for the searching.

In my case, the "Documents" that I'm indexing are a proprietary file type, and each document
has kind of "sub-documents".  So, in my indexer, I parse each of the sub-documents, and, for
a given "Document", I build a long string containing terms that I extracted from each of the
sub-documents, then I do:

doc.add(new Field("contents", longstring, Field.Store.YES, Field.Index.ANALYZED));

I also add the longstring to another non-indexed field, summary:

doc.add(new Field("summary", longstring, Field.Store.YES, Field.Index.NO));

The modified luceneweb web app that I use is pretty vanilla, and originally, what I was asked
to do was to be able to search just for a Document, i.e., given a query like "X and Y" (document
containing both term=X and term=Y), return the file path+name for the document.  I also was
displaying the terms associated with each sub-document by parsing the 'summary' string.

So, for example, if "Document1" contained 3 sub-documents (which contained (term1, term2),
(term1a, term2a), and (term1b, term2b), respectively), and if I queried for "term1a AND term2a",
the web app would display something like:

Document1                 subdoc1 term1 term2
                                      subdoc2 term1a term2a
                                      subdoc3 term1b term2b

However, I've now been asked to implement the ability to query the sub-documents. 

In other words, rather than the web app displaying what I showed above, they want it to return
something like just:

Document1                 subdoc2 term1a term2a

Right now, the web app gets the 'summary' (again, in a long string), then just breaks it into
subdoc1, subdoc2, and subdoc3 lines, just for display purposes, so to do what I've been asked,
I need to query the 3 sub-strings from the 'summary', i.e., run the "term1a AND term2a" query
against the following strings:

subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b

I guess that I can write a method to do that, but I want to make sure that the sub-document/string
query "duplicates" the behavior of the Lucene query.

It seems like there should be a way to duplicate the Lucene query logic by using something
(methods) in Lucene itself??

I've been reviewing the Javadocs, but I'm still fairly new to Lucene, so I was hoping that
someone could point me in the right direction?

My apologies for the longish post, but I hope that I've been able to explain clearly :)!!

Thanks,
Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message