lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Getting terms from unstored fields, doc-wise
Date Fri, 27 Jul 2012 13:15:02 GMT
On 26/07/2012 22:04, Phanindra R wrote:
> Thanks for the reply Abdul.
>
> I was exploring the API and I think we can retrieve all those words by
> using a brute-force approach.
>
> 1) Get all the terms using indexReader.terms()
>
> 2) Process the term only if it belongs to the target field.
>
> 3) Get all the docs using indexReader.termDocs(term);
>
> 4) So, we have the term-doc pairs at this point.

This procedure is implemented in Luke (http://code.google.com/p/luke) in 
the "Reconstruct & Edit" function. In case of larger indexes it's indeed 
a time-consuming procedure.

>
> Is there any better approach other than the above forever-taking procedure?

No. Indexing is usually a lossy process - some data is irretrievably 
lost - and the resulting data structure is not optimized for 
re-assembling the original content. If you need to retrieve the original 
content you have to store it, either using stored fields or in an 
external system.


-- 
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
  ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message