rocketmq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 刘春龙 <631521...@qq.com>
Subject 关于RocketMQ消息存储的几点问题
Date Tue, 04 Dec 2018 13:16:00 GMT
各位RocketMQ社区朋友好,

最近将RocketMQ的底层存储源码逐行看了一遍,发现了如下几个问题。希望各位能抽出时间逐个解答一下。如果是BUG的话,还请修复该问题。谢谢💕。

github的issue地址:https://github.com/apache/rocketmq/issues/574 <https://github.com/apache/rocketmq/issues/574>

————————————————————————————————————————————————————————

下面我把内容贴出来:

1.关于锁synchronized的使用问题;

rocketmq/store/src/main/java/org/apache/rocketmq/store/CommitLog.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/CommitLog.java#L1069>
Line 1069 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 public synchronized void putRequest(final GroupCommitRequest request) { 
rocketmq/store/src/main/java/org/apache/rocketmq/store/CommitLog.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/CommitLog.java#L1136>
Line 1136 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 synchronized (this) { 

我认为上述两处的synchronized是完全没有必要的。减少没必要的锁的使用会增加性能😄。


2. 睡眠时间不合理,可能会造成正在处理的消息响应失败;

rocketmq/store/src/main/java/org/apache/rocketmq/store/CommitLog.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/CommitLog.java#L1131>
Line 1131 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 Thread.sleep(10); 
GroupCommitService此处的睡眠时间是否合理?

因为是同步刷盘策略,设置时间过短,在shutdown时,可能会造成正在处理的消息(eg,正在执行appendMessages的消息)来不及刷盘,进而响应失败(FLUSH_DISK_TIMEOUT)。

是否可以采用其它方式?或者适当延长下时间?10毫秒是不是太短了。


3. 可能会造成消息丢失;

rocketmq/store/src/main/java/org/apache/rocketmq/store/CommitLog.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/CommitLog.java#L943-L947>
Lines 943 to 947 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 boolean result = false; 
 for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) { 
     result = CommitLog.this.mappedFileQueue.commit(0); 
     CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + "
times " + (result ? "OK" : "Not OK")); 
 } 
CommitRealTimeService此处没有第2点中提到的sleep机制。这里是采用尝试10次commit。

举个简单的场景:

A 在执行appendMessages;
B 在执行appendMessages;
在B执行完appendMessages,还未来得及通知CommitRealTimeService,与此同时shutdown;
注意,A还未执行完appendMessages;
按照上述代码逻辑,会尝试10次commit,在第1次时就能将B消息commit,并结束循环;
A 执行完appendMessages;
问题就在于A消息不会被commit了,A消息丢失。但给客户端返回的响应就是OK💢。

相同的,FlushRealTimeService异步刷盘也有类似问题。


4. sleep(0)的问题

rocketmq/store/src/main/java/org/apache/rocketmq/store/MappedFile.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/MappedFile.java#L507>
Line 507 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 Thread.sleep(0); 
-XX:+ConvertSleepToYield 这个参数默认就是 true,如果 sleep 0 的话会转换成
yield

但是 ConvertSleepToYield 设为 false 时,会进行 sleep,且 sleep 的最小时间是
1 ms,也就是说 ConvertSleepToYield 为 false 时,哪怕是 sleep 0 实际上跟 sleep
1 是一样的。这样的话,并不会让出线程的时间片。

我个人认为 RocketMQ 应改为 Thread.yield 处理。或许sleep(0)有其它的意图?

5.commitLog写满时,消息的处理问题

rocketmq/store/src/main/java/org/apache/rocketmq/store/CommitLog.java <https://github.com/apache/rocketmq/blob/1bedba8cbcef32554d2dc80f2a340ea686d68ba9/store/src/main/java/org/apache/rocketmq/store/CommitLog.java#L1091-L1096>
Lines 1091 to 1096 in 1bedba8 <https://github.com/apache/rocketmq/commit/1bedba8cbcef32554d2dc80f2a340ea686d68ba9>
 for (int i = 0; i < 2 && !flushOK; i++) { 
     flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();

  
     if (!flushOK) { 
         CommitLog.this.mappedFileQueue.flush(0); 
     } 
同步刷盘策略下,正常消息刷盘时,依赖第二次循环将flushOK置为true。但是如果当消息A写入时发现commitLog已经写满,则消息A会被写入下一个commitLog。当前待刷盘的commitLog会有两个。

A消息通知刷盘服务执行刷盘,第一次循环刷盘(刷入第一个commitLog的末尾数据)时flushOK
= CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();肯定为false,第二次刷盘(刷入消息A)时flushOK
= CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();也肯定是false,需要第三次循环来将flushOK置为true。

所以上述场景下,flushOK为false。但实际消息A已经刷盘成功了。
Mime
View raw message