lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] 32 bit CentOS Indexing Question
Date Wed, 29 Jan 2014 01:59:58 GMT
On Tue, Jan 28, 2014 at 11:26 AM, Nick D. <ndwyer@globaldataguard.com> wrote:
> Why do I get a memory allocation error on a 32 bit OS and not a 64 bit OS?

It's probably a known architectural flaw in SortWriter which makes it consume
too much RAM.

> Are there any 32 bit limitations of Lucy?

In theory, there should not be.  We have expended considerable effort to
provide compatibility with 32-bit systems, though our optimization target
remains 64-bit.

> Why does the index file grow so large and then shrinks after commit is done?

There is a lot of temporary data produced during indexing.  Before you can
search a large amount of material, you have to sort it.  That takes a lot of
space.

> Should I commit more often?

If you are only generating this index in a single shot, that should be an
adequqate workaround to overcome the SortWriter problem.  However, you must
also override IndexManager#recycle to return an empty arrayref.  Check out
Lucy::Docs::Cookbook::FastUpdates.

> Would committing often slow down the indexing process?

I don't think the difference would be unreasonable.

> Would committing often make the over growth of the index go away?

If you override IndexManager#recycle, yes.

This is assuming you don't need to modify the index later, which I'm guessing
based on the script that you supplied.

Marvin Humphrey

Mime
View raw message