lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phanindra R <phani...@gmail.com>
Subject Re: Getting terms from unstored fields, doc-wise
Date Fri, 27 Jul 2012 22:15:28 GMT
Thanks a lot Aditya and Andrzej .. Your responses were really helpful.

On Fri, Jul 27, 2012 at 6:15 AM, Andrzej Bialecki <ab@getopt.org> wrote:

> On 26/07/2012 22:04, Phanindra R wrote:
>
>> Thanks for the reply Abdul.
>>
>> I was exploring the API and I think we can retrieve all those words by
>> using a brute-force approach.
>>
>> 1) Get all the terms using indexReader.terms()
>>
>> 2) Process the term only if it belongs to the target field.
>>
>> 3) Get all the docs using indexReader.termDocs(term);
>>
>> 4) So, we have the term-doc pairs at this point.
>>
>
> This procedure is implemented in Luke (http://code.google.com/p/luke**)
> in the "Reconstruct & Edit" function. In case of larger indexes it's indeed
> a time-consuming procedure.
>
>
>
>> Is there any better approach other than the above forever-taking
>> procedure?
>>
>
> No. Indexing is usually a lossy process - some data is irretrievably lost
> - and the resulting data structure is not optimized for re-assembling the
> original content. If you need to retrieve the original content you have to
> store it, either using stored fields or in an external system.
>
>
> --
> Best regards,
> Andrzej Bialecki
> http://www.sigram.com, blog http://www.sigram.com/blog
>  ___.,___,___,___,_._. __________________<><_________**___________
> [___||.__|__/|__||\/|: Information Retrieval, System Integration
> ___|||__||..\|..||..|: Contact: info at sigram dot com
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message