incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: badmatch on big binary
Date Mon, 23 Jan 2012 08:18:49 GMT
On Tue, Jan 17, 2012 at 01:35, Peta Bogdan <bog495@gmail.com> wrote:
> Hello,
>
> I have a small database around 120 MB with approximately 16,000 documents.
>
> However, it happens (also from futon) that I get this error:
>
> [Tue, 17 Jan 2012 07:22:01 GMT] [error] [<0.185.0>] {error_report,<0.30.0>,
>                     {<0.185.0>,crash_report,
>                      [[{initial_call,{couch_file,init,['Argument__1']}},
>                        {pid,<0.185.0>},
>                        {registered_name,[]},
>                        {error_info,
>                         {exit,
>                          {{badmatch,
>                            {ok,
>                             9_MEGABYTES_BINARY}},
>                           [{couch_file,read_raw_iolist_int,3},
>                            {couch_file,maybe_read_more_iolist,4},
>                            {couch_file,handle_call,3},
>                            {gen_server,handle_msg,5},
>                            {proc_lib,init_p_do_apply,3}]},
>                          [{gen_server,terminate,6},
>                           {proc_lib,init_p_do_apply,3}]}},
>                        {ancestors,[<0.184.0>]},
>                        {messages,
>                         [{'$gen_call',
>                           {<0.10840.18>,#Ref<0.0.3.20907>},
>                           bytes}]},
>                        {links,[<0.190.0>]},
>                        {dictionary,[]},
>                        {trap_exit,true},
>                        {status,running},
>                        {heap_size,1597},
>                        {stack_size,24},
>                        {reductions,65666}],
>                       [{neighbour,
>                         [{pid,<0.190.0>},
>                          {registered_name,[]},
>                          {initial_call,
>                           {couch_ref_counter,init,['Argument__1']}},
>                          {current_function,{gen_server,loop,6}},
>                          {ancestors,[<0.188.0>,<0.187.0>,<0.184.0>]},
>                          {messages,[]},
>                          {links,[<0.185.0>]},
>                          {dictionary,[]},
>                          {trap_exit,false},
>                          {status,waiting},
>                          {heap_size,610},
>                          {stack_size,9},
>                          {reductions,362}]}]]}}
>
> If this error occurs to frequently causes couch_server to reach its max
> restart frequency causing the entire supervision tree to shutdown and hence
> the database server instance disappears.
>
> The function couch_file:read_raw_iolist_int/3 calls file:pread which
> returns {ok, Binary}. This Binary has almost 9 megabytes in size, which is
> very strange.
> I think this does mean that the function file:pread/3 is instructed to read
> from wrong position.

I suspect you're right. One probably reason for the mismatch is that
file:pread is reading off the end of the file due to an improperly
huge TotalBytes value.
It's not clear why things got to this state. It may be a classic case
of data corruption. It's not anything I've seen reported before.

If you have the inclination to dig through the Erlang terms and find
anything interesting, please let us know.
Alternatively, if you can share the database file someone else might
be able to take a look. If you require so, it may be possible to send
the data to a committer privately if it contains more sensitive
information.

If the problem is corruption, truncating the file before the corrupted
data should allow the database to function again (at the cost of some
data loss).

-Randall

>
> The only reason I can think of is that the value of 'TotalBytes' from line
> (1) doesn't match the value of 'TotalBytes' from line (2)
>
> (1) TotalBytes = calculate_total_read_len(BlockOffset, Len),
> (2) {ok, <<RawBin:TotalBytes/binary>>} = file:pread(Fd, Pos, TotalBytes),
>
> The possible answer would be that in certain conditions the function
> calculate_total_read_len/2 doesn't return the expected value.
>
> Server: CouchDB/1.1.1 (Erlang OTP/R14B04)
> OS: OpenBSD 5.0 GENERIC.MP#63 amd64
>
> Now, the trouble is how to circumvent this situation.
>
> Thank you in advance,
>
> Bogdan
Mime
View raw message