1.4常规es报错问题

问题一：unable to install syscall filter

[2016-11-06T16:27:21,712][WARN ][o.e.b.JNANatives ] unable to install syscall filter:

Java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMPandCONFIG_SECCOMP_FILTERcompiledinatorg.elasticsearch.bootstrap.Seccomp.linuxImpl(Seccomp.java:349) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.bootstrap.Seccomp.init(Seccomp.java:630) ~[elasticsearch-5.0.0.jar:5.0.0]

原因：只是一个警告，主要是因为Linux版本过低造成的。

解决方案：1、重新安装新版本的Linux系统 2、警告不影响使用，可以忽略

问题二：可创建文件数太小

ERROR: bootstrap checks failed

max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536]

原因：无法创建本地文件问题,用户最大可创建文件数太小

解决方案：

切换到root用户，编辑limits.conf配置文件，添加类似如下内容：

vi /etc/security/limits.conf

添加如下内容:

* soft nofile 65536

* hard nofile 131072

* soft nproc 2048

* hard nproc 4096

备注：* 代表Linux所有用户名称（比如 hadoop）

保存、退出、重新登录才可生效

问题三：可创建线程数太小

max number of threads [1024] for user [es] likely too low, increase to at least [2048]

原因：无法创建本地线程问题,用户最大可创建线程数太小

解决方案：切换到root用户，进入limits.d目录下，修改90-nproc.conf 配置文件。

vi /etc/security/limits.d/90-nproc.conf

找到如下内容：

* soft nproc 1024

#修改为

* soft nproc 2048

问题四：最大虚拟内存太小

max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

原因：最大虚拟内存太小

解决方案：切换到root用户下，修改配置文件sysctl.conf

vi /etc/sysctl.conf

添加下面配置：

vm.max_map_count=655360

并执行命令：

sysctl -p

然后重新启动elasticsearch，即可启动成功。

问题五：ElasticSearch启动找不到主机或路由

原因：ElasticSearch 单播配置有问题

解决方案：

检查ElasticSearch中的配置文件

vi config/elasticsearch.yml

找到如下配置：

discovery.zen.ping.unicast.hosts: ["172.16.31.220", "172.16.31.221","172.16.31.224"]

一般情况下，是这里配置有问题，注意书写格式

问题六：Failed to deserialize exception response from stream

org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream

原因:ElasticSearch节点之间的jdk版本不一致

解决方案：ElasticSearch集群统一jdk环境

问题七：Unsupported major.minor version 52.0

原因：jdk版本问题太低

解决方案：更换jdk版本，ElasticSearch5.0.0支持jdk1.8.0

问题八：Unknown plugin license

bin/elasticsearch-plugin install license

ERROR: Unknown plugin license

原因：ElasticSearch5.0.0以后插件命令已经改变

解决方案：使用最新命令安装所有插件

bin/elasticsearch-plugin install x-pack

问题九：bootstrap checks failed

启动异常：ERROR: bootstrap checks failed

system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

问题原因：因为Centos6不支持SecComp，而ES5.2.1默认bootstrap.system_call_filter为true进行检测，所以导致检测失败，失败后直接导致ES不能启动。详见：https://github.com/elastic/elasticsearch/issues/22899

解决方法：在elasticsearch.yml中配置bootstrap.system_call_filter为false，注意要在Memory下面:

bootstrap.memory_lock: false

bootstrap.system_call_filter: false

问题十：Failed to send join request to master

Failed to send join request to master

[{node-1}{WbcP0pC_T32jWpYvu5is1A}{2_LCVHx1QEaBZYZ7XQEkMg}{10.10.11.200}{10.10.11.200:9300}], reason [RemoteTransportException[[node-1][10.10.11.200:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {node-2}{WbcP0pC_T32jWpYvu5is1A}{p-HCgFLvSFaTynjKSeqXyA}{10.10.11.200}{10.10.11.200:9301}, found existing node {node-1}{WbcP0pC_T32jWpYvu5is1A}{2_LCVHx1QEaBZYZ7XQEkMg}{10.10.11.200}{10.10.11.200:9300} with the same id but is a different node instance]; ]

问题原因：要是部署的时候从一个节点复制elasticsearch文件夹，其他节点可能包含被复制节点的data文件数据，需要把data文件下的文件清空

问题十一：java.lang.RuntimeException：can not run elasticsearch as root

解决方法: 使用普通用户运行

问题十二：进程最大可同时打开文件数太小,至少要65536

解决方法: #elk是用户名

# echo "elk soft nofile 65536" >> /etc/security/limits.conf

# echo "elk hard nofile 65536" >> /etc/security/limits.conf

# su - elk

$ ulimit -n

65536

问题十三：请求锁内存失败,系统默认能让进程锁住的最大内存为64k

解决方法:

# echo "elk soft memlock unlimited" >> /etc/security/limits.conf

# echo "elk hard memlock unlimited" >> /etc/security/limits.conf

问题十四： elk用户拥有的内存权限太小了，至少需要262114

解决方法:

# echo vm.max_map_count=262144 >> /etc/sysctl.conf

# sysctl -p

vm.max_map_count = 262144

生产问题分析实例：一个10节点的ES集群，集群健康状态red：

{

"cluster_name": "elasticsearch_zach",

"status": "red",

"timed_out": false,

"number_of_nodes": 8,

"number_of_data_nodes": 8,

"active_primary_shards": 90,

"active_shards": 180,

"relocating_shards": 0,

"initializing_shards": 0,

"unassigned_shards": 20

}

我们集群是 red ，意味着我们缺数据（主分片 + 副本分片）了。

我们知道我们集群原先有 10 个节点，但是在这个健康状态里列出来的只有 8 个数据节点。

有两个数据节点不见了。我们看到有 20 个未分配分片。

这就是我们能收集到的全部信息。那些缺失分片的情况依然是个谜：

我们是缺了 20 个索引，每个索引里少 1 个主分片？

还是缺 1 个索引里的 20 个主分片？

还是 10 个索引里的各 1 主 1 副本分片？

具体是哪个索引？

要回答这个问题，我们需要使用 level 参数让 cluster-health 答出更多一点的信息：

GET _cluster/health?level=indices

{

"cluster_name": "elasticsearch_zach",

"status": "red",

"timed_out": false,

"number_of_nodes": 8,

"number_of_data_nodes": 8,

"active_primary_shards": 90,

"active_shards": 180,

"relocating_shards": 0,

"initializing_shards": 0,

"unassigned_shards": 20

"indices": {

"v1": {

"status": "green",

"number_of_shards": 10,

"number_of_replicas": 1,

"active_primary_shards": 10,

"active_shards": 20,

"relocating_shards": 0,

"initializing_shards": 0,

"unassigned_shards": 0

"v2": {

"status": "red",

"number_of_shards": 10,

"number_of_replicas": 1,

"active_primary_shards": 0,

"active_shards": 0,

"relocating_shards": 0,

"initializing_shards": 0,

"unassigned_shards": 20

"v3": {

"status": "green",

"number_of_shards": 10,

"number_of_replicas": 1,

"active_primary_shards": 10,

"active_shards": 20,

"relocating_shards": 0,

"initializing_shards": 0,

"unassigned_shards": 0

....

}

我们可以看到 v2 索引就是让集群变 red 的那个索引。

由此明确了，20 个缺失分片全部来自这个索引。

我们还可以看到这个索引曾经有 10 个主分片和一个副本，而现在这 20 个分片全不见了。

可以推测，这 20 个索引就是位于从我们集群里不见了的那两个节点上。