lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: Why is my index so large?
Date Wed, 12 Dec 2012 08:25:01 GMT
Hi,

That 20-30%-size-measurement sounds like a general estimation, and you 
may have specific data that does not conform to that measurement. But it 
sounds really odd getting an index which is 187% size of the original data.

Could you show us your code which generates the large index?

// Simon

On 2012-12-10 09:27, Omri Suissa wrote:
> Hi all,
>
> I'm trying to index some files on a file server. I built a crawler that
> runs over the folders and extract the text (using IFilters) from office \
> pdf files.
>
> The size of the files is ~150MB.
>
> I do not store the content.
>
> I store some additional fields per file.
>
> I'm using SnowballAnalyzer (English).
>
> As far as I know Lucene index should be around 20-30% of the size of the
> text.
>
> When I index the files without indexing the content (only the additional
> fields) the index size (after optimization) is ~10MB (this is my overhead).
>
> When I index the files including the content (but not stored) the index
> size (after optimization) is ~280MB instead of ~55MB (150*0.3 + 10).
>
> Why? :)
>
>
>
> Thanks,
>
> Omri
>


Mime
View raw message