activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Tully <gary.tu...@gmail.com>
Subject Re: 12G datastore - there should be virtually nothing
Date Mon, 09 May 2011 14:51:53 GMT
great, that is what the gc was telling us; topic 'dest:1:accounts' was
retaining messages for some durable sub that had not yet
received/acked so the related data files could not be gc'ed

On 9 May 2011 15:12, James Green <james.mk.green@gmail.com> wrote:
> Fixed it.
>
> Went to the subscribers pane of the web console and spotted an old machine
> that had an accounts durable subscription in "NC" mode - the client was long
> dead.
>
> Thankfully was able to delete it and after a pause quite a few of the data
> files were GC'd. Going to repeat this for the other 'NC' clients and hope to
> have further clean up happen.
>
> James
>
> On 9 May 2011 14:43, James Green <james.mk.green@gmail.com> wrote:
>
>> Gary,
>>
>> Let me check I understand you correctly.
>>
>> If I see the GC kick in and list lots of references after lots of channels
>> are considered, then the list suddenly drops by a large amount, that channel
>> is likely to have considerable references (even though they are likely
>> dead)?
>>
>> This is what I see:
>> 2011-05-09 14:29:27,367 [eckpoint Worker] TRACE
>> MessageDatabase                - gc candidates after
>> dest:1:Requests.DeliveryNotificationsRebuild, [113, 114, 117, 118, 121, 122,
>> 123, 134, 135, 136, 138, 139, 140, 143, 144, 148, 149, 152, 153, 165, 166,
>> 167, 169, 170, 171, 174, 175, 178, 179, 180, 183, 184, 196, 197, 200, 201,
>> 202, 205, 206, 209, 210, 211, 214, 215, 217, 218, 219, 222, 223, 226, 227,
>> 228, 231, 232, 245, 246, 249, 250, 254, 255, 258, 259, 263, 264, 276, 277,
>> 280, 281, 282, 285, 286, 287, 289, 290, 291, 294, 295, 296, 308, 309, 312,
>> 313, 314, 317, 318, 319, 322, 323, 327, 328, 340, 341, 344, 345, 346, 349,
>> 350, 351, 354, 355, 356, 358, 359, 360, 372, 373, 374, 377, 378, 379, 381,
>> 382, 383, 386, 387, 391, 392, 414, 419, 423, 424, 441, 442, 445, 446, 447,
>> 450, 451, 455, 457, 469, 470, 474, 475, 476, 479, 480, 481, 485, 486, 487,
>> 490, 491, 492, 507, 508, 512, 513, 514, 519, 520, 521, 524, 525, 526, 529,
>> 530, 531, 545, 546, 547, 551, 552, 553, 556, 557, 558, 563, 564, 567, 568,
>> 569, 582, 583, 584, 585, 589, 590, 591, 595, 596, 597, 600, 601, 602, 603,
>> 606, 622, 623, 624, 628, 629, 630, 634, 636, 640, 641, 642, 645, 646, 647,
>> 661, 662, 663, 666, 667, 668, 669, 672, 673, 674, 678, 681, 684, 685, 686,
>> 699, 700, 701, 702, 705, 706, 707, 710, 711, 712, 713, 719, 722, 723, 724,
>> 737, 738, 739, 740, 743, 744, 745, 749, 750, 751, 752, 755, 756, 757, 760,
>> 761, 762, 775, 776, 777, 778, 781, 782, 783, 784, 787, 788, 789, 790, 794,
>> 795, 796, 799, 800, 801, 812, 813, 814, 815, 819, 821, 824, 825, 826, 832,
>> 833, 834, 837, 838, 839, 852, 853, 854, 855, 859, 860, 865, 866, 867, 870,
>> 871, 872, 876, 891, 892, 893, 897, 898, 899, 902, 903, 904, 908, 909, 910,
>> 911, 914, 915, 916, 930, 932, 936, 937, 938, 942, 943, 947, 948, 949, 968,
>> 969, 973, 974, 975, 979, 980, 985, 986, 987, 990, 991, 992, 1006, 1007,
>> 1008, 1012, 1013, 1017, 1019, 1022, 1024, 1053, 1054, 1058, 1060, 1082,
>> 1083, 1084, 1085, 1088, 1090, 1091, 1094, 1095, 1096, 1099, 1100, 1101,
>> 1102]
>> 2011-05-09 14:29:27,367 [eckpoint Worker] TRACE
>> MessageDatabase                - gc candidates after dest:1:accounts, [553,
>> 556, 1102]
>>
>> accounts is a topic, consumed by each of "production boxes". I am not aware
>> of any problems this respect. Each consumer is durable. Each message is
>> processed then ACKed.
>>
>> Can you suggest any reason this situation is occurring?
>>
>> Is there a way to list the contents of these data files in a more
>> meaningful way? To list for example references to other data files as you
>> suggest?
>>
>> Thanks,
>>
>> James
>>
>>
>> On 6 May 2011 17:57, Gary Tully <gary.tully@gmail.com> wrote:
>>
>>> reading the trace output is a little unintuitive as it follows the code
>>> logic...
>>> so it starts with the entire set of data files and considers them all
>>> as gc candidates.
>>> Then it asks each destination in turn if it still has pending
>>> references and if so
>>> removes them from the gc candidate set.
>>>
>>> The list should get smaller as destinations grab data files.
>>>
>>> In the case below, it looks like after asking
>>> dest:0:Outbound.Account.22312, there are still
>>> lots of data files that will be ok to gc.
>>>
>>> The second step is to determine if the data files contain acks for
>>> referenced data files. Deleting them would mean after a
>>> failure/recovery restart, the acks would be gone and the messages in
>>> the referenced data files would be replayed in error.
>>>
>>> That further reduces the candidate list.
>>>
>>> So you need to look for the channel that pulls the most from the gc
>>> candidate list.
>>>
>>>
>>> On 6 May 2011 17:18, James Green <james.mk.green@gmail.com> wrote:
>>> > OK to take but one channel:
>>> > 2011-05-06 17:16:25,154 [eckpoint Worker] TRACE
>>> > MessageDatabase                - gc candidates after
>>> > dest:0:Outbound.Account.22312, [113, 114, 117, 118, 121, 122, 123, 134,
>>> 135,
>>> > 136, 138, 139, 140, 143, 144, 148, 149, 152, 153, 165, 166, 167, 169,
>>> 170,
>>> > 171, 174, 175, 178, 179, 180, 183, 184, 196, 197, 200, 201, 202, 205,
>>> 206,
>>> > 209, 210, 211, 214, 215, 217, 218, 219, 222, 223, 226, 227, 228, 231,
>>> 232,
>>> > 245, 246, 249, 250, 254, 255, 258, 259, 263, 264, 276, 277, 280, 281,
>>> 282,
>>> > 285, 286, 287, 289, 290, 291, 294, 295, 296, 308, 309, 312, 313, 314,
>>> 317,
>>> > 318, 319, 322, 323, 327, 328, 340, 341, 344, 345, 346, 349, 350, 351,
>>> 354,
>>> > 355, 356, 358, 359, 360, 372, 373, 374, 377, 378, 379, 381, 382, 383,
>>> 386,
>>> > 387, 391, 392, 414, 419, 423, 424, 441, 442, 445, 446, 447, 450, 451,
>>> 455,
>>> > 457, 469, 470, 474, 475, 476, 479, 480, 481, 485, 486, 487, 490, 491,
>>> 492,
>>> > 507, 508, 512, 513, 514, 519, 520, 521, 524, 525, 526, 529, 530, 531,
>>> 545,
>>> > 546, 547, 551, 552, 553, 556, 557, 558, 563, 564, 567, 568, 569, 582,
>>> 583,
>>> > 584, 585, 589, 590, 591, 595, 596, 597, 600, 601, 602, 603, 606, 622,
>>> 623,
>>> > 624, 628, 629, 630, 634, 636, 640, 641, 642, 645, 646, 647, 661, 662,
>>> 663,
>>> > 666, 667, 668, 669, 672, 673, 674, 678, 681, 684, 685, 686, 699, 700,
>>> 701,
>>> > 702, 705, 706, 707, 710, 711, 712, 713, 719, 722, 723, 724, 737, 738,
>>> 739,
>>> > 740, 743, 744, 745, 749, 750, 751, 752, 755, 756, 757, 760, 761, 762,
>>> 775,
>>> > 776, 777, 778, 781, 782, 783, 784, 787, 788, 789, 790, 794, 795, 796,
>>> 799,
>>> > 800, 801, 812, 813, 814, 815, 819, 821, 824, 825, 826, 832, 833, 834,
>>> 837,
>>> > 838, 839, 852, 853, 854, 855, 859, 860, 865, 866, 867, 870, 871, 872,
>>> 876,
>>> > 891, 892, 893, 897, 898, 899, 902, 903, 904, 908, 909, 910, 911, 914,
>>> 915,
>>> > 916, 930, 932, 936, 937, 938, 942, 943, 947, 948, 949, 968, 969, 973,
>>> 974,
>>> > 975, 979, 980, 985, 986, 987, 990, 991, 992, 1006, 1007, 1008, 1012,
>>> 1013,
>>> > 1017, 1019, 1022, 1024, 1053, 1054, 1058, 1060, 1082, 1083, 1084, 1085,
>>> > 1088, 1090, 1091, 1094, 1095, 1096, 1099, 1100]
>>> >
>>> > That channel would only ever have messages sent/received on a remote
>>> > machines. The messages would never go over the network.
>>> >
>>> > Clearly that's a lot of references which should not be there. Any ideas?
>>> >
>>> > James
>>> >
>>> > On 6 May 2011 15:01, Gary Tully <gary.tully@gmail.com> wrote:
>>> >
>>> >> on the broker, enable TRACE level logging for:
>>> >> org.apache.activemq.store.kahadb.MessageDatabase
>>> >>
>>> >> and the cleanup processing will tell you which destination has a
>>> >> reference to those data files.
>>> >>
>>> >> On 6 May 2011 09:26, James Green <james.mk.green@gmail.com> wrote:
>>> >> > Ubuntu Linux running AMQ 5.5.0, previously running 5.4.x releases.
>>> >> >
>>> >> > I have just noticed our "hub" machine has 12% store used. df -h
>>> inside
>>> >> the
>>> >> > kahadb dir shows 357 .log files consuming 12G of space. They begin
>>> Oct
>>> >> 2010
>>> >> > - there are no obvious large gaps over time but some files are
>>> clearly
>>> >> gone.
>>> >> >
>>> >> > Looking at lsof only three are currently open. The hub receives
>>> messages
>>> >> on
>>> >> > queues and publishes messages on topics.
>>> >> >
>>> >> > Can anyone advise on investigation work please.
>>> >> >
>>> >> > James
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> http://blog.garytully.com
>>> >> http://fusesource.com
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> http://blog.garytully.com
>>> http://fusesource.com
>>>
>>
>>
>



-- 
http://blog.garytully.com
http://fusesource.com

Mime
View raw message