lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: Huge cfs files
Date Tue, 25 Apr 2017 07:18:50 GMT
Sorry, true. It is the other way round. Smaller files should be CFS.

Am 25. April 2017 03:22:32 GMT+01:00 schrieb "Otis Gospodnetić" <otis.gospodnetic@gmail.com>:
>Hi Uwe,
>
>> For larger segments it will automatically create CFS files
>
>I was under the impression Lucene packed only smaller segments into CFS
>files..... based on this 3 years old comment from Mike:
>https://github.com/elastic/elasticsearch/issues/8919 .  Maybe that
>comment
>is out of date now?
>
>Thanks,
>Otis
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>On Sun, Apr 23, 2017 at 11:40 AM, Uwe Schindler <uwe@thetaphi.de>
>wrote:
>
>> Hi Avi,
>>
>> There is nothing wrong with CFS files. They are just like zip files,
>> containing multiple other index files. Sometimes, when you add only
>few
>> documents, IndexWriter starts to merge several older segments to a
>new
>> file. For larger segments it will automatically create CFS files, as
>those
>> segments are unlikely to change. During merging it needs additional
>disk
>> space. At end of merging it will delete old segments, unless they are
>used
>> by older commit points or if Index searchers are referring to them.
>You
>> should have at least 2 or 3 times the original index size on spare
>for
>> indexes that change. Keep in mind, that e.g. on Windows where files
>in use
>> cannot be deleted, you may see older segment for long time.
>> As far as I know, depending on merge policy, Sole no longer defaults
>to
>> not use CFS files. For large segments CFS files are better as they
>use less
>> file handles. Smaller segments still use no compounds. So it is a
>matter of
>> segment size by default, like in Lucene.
>>
>> Uwe
>>
>>
>> Am 23. April 2017 11:50:17 MESZ schrieb Avi Steiner
><asteiner@varonis.com
>> >:
>>>
>>> Hi
>>>
>>>
>>>
>>> We have a customer with Solr 5.3.1.
>>>
>>> The index contains less than 3.5 million docs, and index folder size
>is
>>> about 240GB.
>>>
>>> I found that the huge files are .cfs files (compound files) that
>were
>>> created lately although only few documents were added.
>>>
>>> The useCompoundFile parameter is commented in SolrConfig.xml.
>>>
>>> As far as I understand the default of Solr is false, and of Lucene
>is
>>> true, which means this feature should be disabled.
>>>
>>> I would like to understand why those files created and why they are
>so
>>> huge.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Avi
>>>
>>>
>>>
>>> ------------------------------
>>> This email and any attachments thereto may contain private,
>confidential,
>>> and privileged material for the sole use of the intended recipient.
>Any
>>> review, copying, or distribution of this email (or any attachments
>thereto)
>>> by others is strictly prohibited. If you are not the intended
>recipient,
>>> please contact the sender immediately and permanently delete the
>original
>>> and any copies of this email and any attachments thereto.
>>
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de
>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Mime
View raw message