couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Nicollet <vnicol...@runorg.com>
Subject Re: Corrupted database example file
Date Thu, 18 Apr 2013 22:41:40 GMT
I searched the logs for any signs of error. The operations performed on the
prod-folder database in the two hours before the first crash were :

https://gist.github.com/VictorNicollet/878d0176960cc71d9ac1

The compact at 10:54:08 finished without a hitch.
The compact at 11:54:07 finished with :

https://gist.github.com/VictorNicollet/4d6ccd60bec2ae922a32



On 19 April 2013 00:17, Victor Nicollet <vnicollet@runorg.com> wrote:

> It had happened once on a critical production database (the user
> database...) so I wrote some code to repair it. And I never throw away any
> code.
>
> If you're interested (but I doubt it : it's pretty useless), I could share
> the repair code.
>
> More info on the logs : apparently, the first compact-related crash
> happened Wed, 17 Apr 2013 11:54:08 GMT : since I have hourly compacts, it
> means the corruption happened Wed, 17 Apr 2013 10:54:08 GMT at the
> earliest. Sifting through that period right now...
>
>
> On 19 April 2013 00:13, Robert Newson <rnewson@apache.org> wrote:
>
>> You say this happens often? Clearly often enough that you have a
>> routine to repair it.
>>
>> B.
>>
>> On 18 April 2013 23:12, Robert Newson <rnewson@apache.org> wrote:
>> > Hi Victor,
>> >
>> > Thanks for the information, we appreciate it.
>> >
>> > B.
>> >
>> > On 18 April 2013 23:07, Victor Nicollet <vnicollet@runorg.com> wrote:
>> >> Replying to my own mail, hoping it will end up in the same thread (I
>> was
>> >> not fully subscribed when I posted this, but I still read the
>> archives).
>> >>
>> >> Answers to the questions you asked :
>> >>
>> >>  - I have no idea when the issue happened. I will try to track it down
>> in
>> >> the logs. I'm afraid I don't have time to filter out all customer
>> >> information from the logs and provide them to you, though I can
>> certainly
>> >> grep for error dumps if you want me to. I have never seen disk-related
>> >> errors in the log.
>> >>  - I am running Debian x86_64 GNU/Linux, with erlang 1:15.b.1-d
>> >>  - There are no unusual CouchDB configuration options ; the only
>> change I
>> >> performed was to disable reduce_limit. A perhaps notable usage aspect
>> : all
>> >> the databases are compacted hourly.
>> >>  - It's not NFS. From /etc/fstab :
>> >>
>> >> /dev/sda1       /       ext4    errors=remount-ro       0       1
>> >> /dev/sda2       /home   ext4    defaults                0       2
>> >>
>> >> The dual-partition setup is a silly default from OVH (my dedicated
>> server
>> >> host), so I have /var/lib/couchdb as a symlink to /home/couchdb/lib,
>> from
>> >> sda1 to sda2.
>> >>
>> >> - I can't rule out a disk issue, because I don't have a lot of
>> experience
>> >> with those... any obvious diagnosis command you would like me to run ?
>> I am
>> >> certain that I have not run out of disk space, though (still around 1TB
>> >> free on that drive).
>> >>
>> >> Thank you for your patience.
>> >>
>> >> On 18 April 2013 14:17, Victor Nicollet <vnicollet@runorg.com> wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> The @CouchDB twitter account thought you might find this information
>> >>> helpful.
>> >>>
>> >>> My SaaS start-up uses CouchDB as its primary database. Lately, I have
>> been
>> >>> having database corruption issues with version 1.2.0 : every few
>> weeks, one
>> >>> of our databases becomes corrupted, which has several negative
>> consequences
>> >>> (among others) :
>> >>>
>> >>>    - Replication of that database fails (it does not even start).
>> >>>    - Compaction of that database fails and *freezes* the server.
>> >>>    - Several documents in the database become inaccessible through
>> either
>> >>>    direct access or through _all_docs.
>> >>>
>> >>>  The latest affected database does not contain any information about
>> our
>> >>> customers, so I am allowed to release it publicly :
>> >>>
>> >>> http://nicollet.net/public/2013-04-18.couchdb/prod-folder.couch
>> >>>
>> >>> This database contains 325 irretrievable documents between identifiers
>> >>> 2xFEY0pU2Eb and 3Fn6l04G6Oa.
>> >>> I hope this helps,
>> >>>
>> >>> --
>> >>> Victor Nicollet, CTO, www.runorg.com
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Victor Nicollet, Directeur Technique, www.runorg.com
>>
>
>
>
> --
> Victor Nicollet, Directeur Technique, www.runorg.com
>



-- 
Victor Nicollet, Directeur Technique, www.runorg.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message