lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Field collapsing memory usage
Date Thu, 22 Jan 2015 21:52:30 GMT
Toke:

What do you think about folding this into the Solr (or Lucene?) code
base? Or is it to specialized?

Not sure one way or the other, just askin'....

Erick

On Thu, Jan 22, 2015 at 3:47 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> Norgorn [lsunnydayl@mail.ru] wrote:
>> Is there any way to make 'docValues="true"' without reindexing?
>
> Depends on how brave you are :-)
>
> We recently had the same need and made https://github.com/netarchivesuite/dvenabler
> To my knowledge that is the only existing tool for that task an as we are the only ones
having used it, robustness is not guaranteed. Warnings aside, it works without problems in
our tests as well as the few real corpuses we have tested on. It does use a fairly memory
hungry structure during the conversion. If the number of _unique_ values in your grouping
field approaches 1b, I loosely guess that you will need 40GB+ of heap. Do read https://github.com/netarchivesuite/dvenabler/issues/14
if you want to try it.
>
> - Toke Eskildsen

Mime
View raw message