lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: can anybody give some suggest about this elasticsearch shard failed problem? thanks
Date Tue, 05 Jun 2018 15:53:57 GMT
The fact that it happens when writing compound files is very suspicious
since there should be little time between when the original files are
written and when they are merged into a compound file. Is it a remote
filesystem? Do you have cron jobs that run every 6 hours?

Le mar. 5 juin 2018 à 16:43, 喜之郎 <251922566@qq.com> a écrit :

> lucene version is 6.3.0
> filesystem is xfs.
> And this always happen at 00:02 06:02 12:02 18:02,
> it's very strange
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "251922566"<251922566@qq.com>;
> 发送时间: 2018年6月5日(星期二) 晚上6:50
> 收件人: "java-user"<java-user@lucene.apache.org>;
>
> 主题: can anybody give some suggest about this elasticsearch shard failed
> problem? thanks
>
>
>
>
> Elasticsearch version (bin/elasticsearch --version): 5.1.1
>
> Plugins installed: [] no
>
> JVM version (java -version): 1.8.0_77
>
> OS version (uname -a if on a Unix-like system): CentOS Linux release
> 7.2.1511 (Core)
>
> Description of the problem including expected versus actual behavior:
> when using update api ,highly concurrency , primary shard and replication
> shard all failed.
> And this happened many times in 2 machines. So I think tihs is a bug.
>
> Provide logs (if relevant):
>
> [2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][7] received shard failed for shard id
> [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
> org.elasticsearch.action.FailedNodeException: Failed node
> [BbfFMNRpRvW5p8LDs3rquQ]
> at
> org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312)
> [elasticsearch-5.1.1.jar:5.1.1]
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [172.20.3.2_1][172.20.3.2:9301
> ][internal:cluster/nodes/indices/shard/store[n]]
> Caused by: org.elasticsearch.ElasticsearchException: Failed to list store
> metadata for shard [[analytics_action_12014_201804][15]]
> Caused by: org.apache.lucene.index.CorruptIndexException: failed engine
> (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
> Caused by: java.io.IOException: failed engine (reason: [corrupt file
> (source: [index])])
> Caused by: org.apache.lucene.index.CorruptIndexException: compound
> sub-files must have a valid codec header and footer: file is too small (0
> bytes)
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
> [2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][18] received shard failed for shard id
> [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message