lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: Retrieve all terms
Date Thu, 19 May 2005 06:53:17 GMT
Bill Tschumy writes:
> 
> On May 18, 2005, at 9:54 AM, Albert Vila wrote:
> 
> > Hi all,
> >
> > I need to retrieve all terms from an specified field filtered for  
> > another field. For example,
> >
> >  Document 1 -> <contents, " document 1 content">
> >                          <language, en>
> >
> >  Document 2 -> <contents, " document 2 content">
> >                          <language, fr>
> >
> >  Document 3 -> <contents, " document 3 content">
> >                          <language, fr>
> >
> >  Document 4 -> <contents, " document 4 content">
> >                          <language, en>
> >
> > Then, I want to retrieve all terms from the contents field, but  
> > only the ones from the documents matching the language=en.
> >
> > Is it possible with lucene?
> > Thanks
> 
> Unless I'm misunderstanding your request, not only is it possible,  
> this is what Lucene is designed for.  Just search for all documents  
> with language=en and then iterate over the hits extracting the  
> contents of the desired field.
> 
I think he doesn't want the contents but a term list for these contents.
Something like
1	  1
4	  1
content	  2
document  2
for his sample, where the number is the fequency of the term.

I don't think that you can easily get that from one lucene index.
The easiest way to get a term listing for one field of one document is
to use the term vector support. But for a document collection that would
still mean to join all term vectors of all matched documents.

I would suggest to index the different collections in separated indexes.
Then you can simply loop over all terms.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message