couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: _cleanup_view is deleting all views
Date Mon, 03 Feb 2014 15:11:13 GMT
1. is it designed behavior (true for couchdb also).

2. again, designed behavior. They’ll be closed if the database closes, which happens when
you exceed the size of the LRU.

3. oooh. that’s a mistake on your part, and it explains it all. You must _view_cleanup on
:5984, the clustering code has to be invoked as the design document could be on a remote node,
the logic to calculate shard filenames to retain, etc.

B.

On 3 Feb 2014, at 14:39, Vladimir Ralev <vladimir.ralev@gmail.com> wrote:

> Hello again.
> 
> I did some more testing and here are some observations. I am analyzing this
> from the perspective of running 1000 databases, with 30 views each.
> Bigcouch will partition the databases into smaller DBS, about 10000
> databases in total per machine. Each of these will have 30 views. And 300K+
> files in the directory structure total, per machine.
> 
> What I see in a smaller scale test is the following
> 1. Initially the views are not generated, only when you access the view
> http://host:5984/aea8b710ab5f0/desgn/etc.. then the view files is built
> from scratch.
> 2. Once you access the view file this way, the file handles to this file
> are kept open forever from the beam.smp process. Never closes until the
> bigcouch is restarted. The couchjs process terminates and releases the
> handle while indexing.
> 3. If you run
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> views are deleted, always
> 4. If you run http://host:5984/aea8b710ab5f0/_view_cleanup the views are
> NOT deleted, I guess that's the correct clean up I should use
> 5. If you restart bigcouch to force the file handle to close, and make no
> read request to that view (to open the file handle), the bigcouch will
> slowly start to open files and never close them again until next time.
> 6. When you delete files with
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> erlanf file:delete is used which doesn't care about file handles, it
> just deleted by name, thus the deleted files remain referenced and the
> handle is preserved to be seen in lsof. The cycle of deleting and
> rebuilding these files never stops and the descriptors leak.
> 
> Do these observations make sense?
> 
> I think 300K+ handles is manageable as long as it doesn't recycle
> constantly, but I need to understand the correct _view_cleanup REST API to
> use. Is http://host:5984 sufficient?
> 
> I added some logs on file close and so on and it's mostly called on db
> files. I couldn't trace it to any point to release a view file handle, if
> you can point me to the code which may release it, I can check.
> 
> Thanks a lot for any feedback.
> 
> 
> On Sat, Feb 1, 2014 at 6:41 AM, Vladimir Ralev <vladimir.ralev@gmail.com>wrote:
> 
>> Not sure at all. I don't know how to check precisely if a live design doc
>> is pointing to a particular file. I was basing my statement off the fact
>> that I have my views declared and they were available pre-indexed before
>> compaction (but they were not physically opened as file handles by couch,
>> but they were opened on demand). Once I finish my current script, I will
>> test everything again and will spend some time tracing the code.
>> 
>> 
>> On Fri, Jan 31, 2014 at 6:52 PM, Robert Samuel Newson <rnewson@apache.org>wrote:
>> 
>>> 
>>> Ownership is interesting. Would the bigcouch user have the right to
>>> delete the file but not open it for reading?
>>> 
>>> There's definitely an issue in bigcouch (fixed long since in couchdb)
>>> where any failure to open a view file makes us delete it.
>>> 
>>> OS/fs all check out fine, You see the filename that should be retained in
>>> that log output? you're 100% sure? You do have a live design doc pointing
>>> to it?
>>> 
>>> B.
>>> 
>>> On 31 Jan 2014, at 16:39, Vladimir Ralev <vladimir.ralev@gmail.com>
>>> wrote:
>>> 
>>>> Thanks a lot. The database was moved from older machines so some other
>>> file
>>>> system metadata might be scrambled. But I don't see what can cause a
>>>> problem like this.
>>>> 
>>>> Yes the debug output is seen "deleting unused view index files:" and it
>>>> deletes every view in every database, little doubt about it. It doesn't
>>>> delete fresh views though that are fully regenerated afterwards. I think
>>>> the original views somehow got corrupted, but I need to figure out why
>>> and
>>>> may be fix it manually with a script
>>>> 
>>>> OS is Debian 64, file system is ext4, there is a little scramble of the
>>>> file ownership, some directories are owned by old bigcouch user, others
>>> by
>>>> root, so that's one thing I am investigating. I reset the ownership, but
>>>> will have to repeat it for my next tests.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <
>>> rnewson@apache.org>wrote:
>>>> 
>>>>> and details of OS, filesystem, anything you think might be relevant.
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 31 Jan 2014, at 16:20, Robert Samuel Newson <rnewson@apache.org>
>>> wrote:
>>>>> 
>>>>>> First thing to note is that bigcouch development is over, but we
can
>>> at
>>>>> least confirm this;
>>>>>> 
>>>>>> This function fetches all the design docs of the database, grabs
all
>>> the
>>>>> signatures from each (you'll have noticed view filenames look
>>> uuid/randomy,
>>>>> that's a 'sig'), and then sweeps the dir where all views for the given
>>>>> database should be and deletes those not in the 'keep' list.
>>>>>> 
>>>>>> Can you enable debug level logging (curl
>>>>> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
>>>>> nodes) and tell us if ;
>>>>>> 
>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>> 
>>>>>> actually gets printed?
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> On 31 Jan 2014, at 16:09, Vladimir Ralev <vladimir.ralev@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> bigcouch 0.4.2 has the following code that handles view cleanup:
>>>>>>> 
>>>>>>> cleanup_index_files(Db) ->
>>>>>>> 
>>>>>>> % load all ddocs
>>>>>>> 
>>>>>>> {ok, DesignDocs} = couch_db:get_design_docs(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> % make unique list of group sigs
>>>>>>> 
>>>>>>> Sigs = lists:map(fun(#doc{id = GroupId}) ->
>>>>>>> 
>>>>>>>     {ok, Info} = get_group_info(Db, GroupId),
>>>>>>> 
>>>>>>>     ?b2l(couch_util:get_value(signature, Info))
>>>>>>> 
>>>>>>> end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>>>>>>> 
>>>>>>> 
>>>>>>> FileList = list_index_files(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> DeleteFiles =
>>>>>>> 
>>>>>>> if length(Sigs) =:= 0 ->
>>>>>>> 
>>>>>>>     FileList;
>>>>>>> 
>>>>>>> true ->
>>>>>>> 
>>>>>>>     % regex that matches all ddocs
>>>>>>> 
>>>>>>>     RegExp = "("++ string:join(Sigs, "|") ++")",
>>>>>>> 
>>>>>>> 
>>>>>>> % filter out the ones in use
>>>>>>> 
>>>>>>>     [FilePath || FilePath <- FileList,
>>>>>>> 
>>>>>>>         re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>>>>>>> 
>>>>>>> end,
>>>>>>> 
>>>>>>> 
>>>>>>> % delete unused files
>>>>>>> 
>>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>>> 
>>>>>>> RootDir = couch_config:get("couchdb", "view_index_dir"),
>>>>>>> 
>>>>>>> [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>>>>>>> 
>>>>>>> ok.
>>>>>>> 
>>>>>>> 
>>>>>>> From here
>>>>>>> 
>>>>> 
>>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>>>>>>> 
>>>>>>> It's supposed to delete only unused views, but in my case it
deletes
>>>>>>> everything and then starts building from scratch. Can you help
me
>>>>>>> understand the condition used here to filter the files that are
>>>>> currently
>>>>>>> in use? How is the regex supposed to work.
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Mime
View raw message