lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Getting terms from unstored fields, doc-wise
Date Fri, 27 Jul 2012 13:15:02 GMT
On 26/07/2012 22:04, Phanindra R wrote:
> Thanks for the reply Abdul.
> I was exploring the API and I think we can retrieve all those words by
> using a brute-force approach.
> 1) Get all the terms using indexReader.terms()
> 2) Process the term only if it belongs to the target field.
> 3) Get all the docs using indexReader.termDocs(term);
> 4) So, we have the term-doc pairs at this point.

This procedure is implemented in Luke ( in 
the "Reconstruct & Edit" function. In case of larger indexes it's indeed 
a time-consuming procedure.

> Is there any better approach other than the above forever-taking procedure?

No. Indexing is usually a lossy process - some data is irretrievably 
lost - and the resulting data structure is not optimized for 
re-assembling the original content. If you need to retrieve the original 
content you have to store it, either using stored fields or in an 
external system.

Best regards,
Andrzej Bialecki, blog
  ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message