一、分布式系统中的一致性模型
在分布式系统中,一致性模型定义了多个副本之间的数据同步程度。主要有两种模型:
- 强一致性:所有副本在任何时间点上都保证一致的状态。
- 最终一致性:允许短期的不一致,但在没有新的更新操作后,系统最终会达到一致的状态。
二、 Envoy的服务发现和健康检查
Envoy的服务发现机制是基于最终一致性模型的。这意味着在某一时刻,Envoy实例可能会对上游服务的成员列表有不同的看法,但随着时间推移,这些视图会趋于一致。
服务发现的最终一致性
Envoy通过xDS(如CDS、EDS)协议与控制平面(如Istio、Consul)通信来获取服务的最新信息。在实际操作中:
- 订阅和推送:Envoy实例订阅控制平面发布的服务信息更新。当服务实例加入或离开网格时,控制平面会将这些变化推送给订阅的Envoy实例。
- 传播延迟:由于网络延迟、处理时间等因素,不同Envoy实例接收到更新的时间点可能不同。因此,在短期内,Envoy实例之间的服务信息可能不一致。
- 最终一致性:随着控制平面不断推送更新,并且所有Envoy实例定期刷新服务信息,所有实例最终会达到一致的视图。
主动健康检查
为了确保服务的可靠性,Envoy结合了主动健康检查机制来判定集群的健康状态:
- 健康检查类型:Envoy支持多种健康检查方式,包括HTTP、TCP和GRPC健康检查。
- 周期性检查:Envoy定期向上游服务实例发送健康检查请求,以确定它们是否健康。
- 状态报告:根据健康检查的结果,Envoy可以动态调整负载均衡策略,例如只将流量发送到健康的实例,避免不健康的实例。
具体场景理解
假设在一个服务网格中,有多个Envoy实例和一个控制平面。控制平面管理着一个名为 example_service
的服务,该服务有多个实例。
- 实例加入网格:当新的服务实例
example_service_3
加入网格时,控制平面会更新其服务信息并推送给所有订阅的Envoy实例。 - 传播和更新:由于传播延迟,不同Envoy实例接收到该更新的时间点不同。一段时间内,某些Envoy实例可能还不知道
example_service_3
的存在。 - 最终一致:随着时间推移,所有Envoy实例都会收到该更新并更新其内部状态,最终达到一致。
- 健康检查:在这一过程中,Envoy实例会持续对
example_service
的所有实例进行健康检查。如果某个实例(如example_service_2
)变得不健康,Envoy会将其标记为不健康,并停止将流量路由到该实例,直到其恢复健康状态。
Envoy的服务发现采用最终一致性模型,而不是强一致性模型。这意味着它允许短期的不一致,但最终会达到一致的状态。通过结合主动健康检查机制,Envoy能够确保尽可能多地将流量路由到健康的上游服务实例,从而提高整个系统的可靠性和稳定性。这种设计既保证了系统的灵活性和扩展性,又通过健康检查机制维护了服务的可用性。
三、主动健康检测类型及示例
在Envoy中,主动健康检查(Active Health Checking)是一种机制,用于定期向上游服务实例发送健康检查请求,以确定它们是否可以正常处理请求。通过主动健康检查,Envoy可以确保仅将流量路由到健康的实例,从而提高服务的可靠性和可用性。
3.1 健康检查类型
Envoy支持多种健康检查类型,包括:
HTTP/HTTPS 健康检查
:发送HTTP/HTTPS请求并检查响应状态码。TCP 健康检查
:通过TCP连接建立是否成功来判断健康状态。gRPC 健康检查
:发送gRPC请求并检查响应状态。
3.2 健康检查配置示例
以下是如何在Envoy中配置主动健康检查的示例,包括HTTP、TCP和gRPC健康检查。
3.2.1 HTTP 健康检查
static_resources:
clusters:
- name: http_service_cluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: http_service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: http_service.example.com
port_value: 80
health_checks:
- timeout: 1s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: /health
expected_statuses:
- start: 200
end: 200
在这个示例中,Envoy会每10秒向 http_service.example.com
发送一个HTTP请求,请求路径为 /health
。如果连续2次返回200状态码,Envoy会将该实例标记为健康;如果连续3次没有返回200状态码,则标记为不健康。
3.2.2 TCP 健康检查
static_resources:
clusters:
- name: tcp_service_cluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: tcp_service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: tcp_service.example.com
port_value: 9000
health_checks:
- timeout: 1s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
tcp_health_check: {}
在这个示例中,Envoy会每10秒尝试与 tcp_service.example.com
的9000端口建立TCP连接。如果连续2次连接成功,则标记为健康;如果连续3次连接失败,则标记为不健康。
3.2.3 gRPC 健康检查
static_resources:
clusters:
- name: grpc_service_cluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: grpc_service_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: grpc_service.example.com
port_value: 50051
health_checks:
- timeout: 1s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
grpc_health_check:
service_name: "my_service"
在这个示例中,Envoy会每10秒向 grpc_service.example.com
的50051端口发送一个gRPC健康检查请求,服务名为 my_service
。如果连续2次检查成功,则标记为健康;如果连续3次检查失败,则标记为不健康。
关键配置项解释
timeout
: 每次健康检查的超时时间。interval
: 健康检查的时间间隔。unhealthy_threshold
: 连续检查失败次数超过该值时,实例被标记为不健康。healthy_threshold
: 连续检查成功次数超过该值时,实例被标记为健康。http_health_check
: HTTP健康检查的具体配置,包括检查路径和期望的状态码范围。tcp_health_check
: TCP健康检查的配置,通常为空。grpc_health_check
: gRPC健康检查的具体配置,包括服务名。
3.2.4 监控和调试
Envoy提供了丰富的监控和调试工具,可以通过/admin接口查看健康检查的状态和结果。例如,访问 http://localhost:9901/stats
可以查看健康检查的统计信息。
通过主动健康检查,Envoy可以动态监控上游服务实例的健康状态,并根据检查结果调整流量路由。这种机制有助于提高服务的可靠性,确保只有健康的实例接收请求,避免因实例故障导致的服务不可用。
四、 主动健康检测案例
4.1 基于http协议主动健康检测
[root@dockerhost-envoy ~]# mkdir envoy_cluster_health_checks
[root@dockerhost-envoy ~]# cd envoy_cluster_health_checks
# cat docker-compose.yaml
services:
envoy:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./front-envoy.yaml:/etc/envoy/envoy.yaml
networks:
envoymesh:
ipv4_address: 172.29.1.2
aliases:
- front-proxy
depends_on:
- webserver01-sidecar
- webserver02-sidecar
webserver01-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: blue
networks:
envoymesh:
ipv4_address: 172.29.1.3
aliases:
- myservice
webserver01:
image: docker.17ker.top/envoy/demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver01-sidecar"
depends_on:
- webserver01-sidecar
webserver02-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: yellow
networks:
envoymesh:
ipv4_address: 172.29.1.4
aliases:
- myservice
webserver02:
image: docker.17ker.top/envoy/demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver02-sidecar"
depends_on:
- webserver02-sidecar
networks:
envoymesh:
driver: bridge
ipam:
config:
- subnet: 172.29.1.0/24
# cat front-envoy.yaml
admin:
profile_path: /tmp/envoy.prof # 指定Envoy性能分析数据的保存路径。
access_log_path: /tmp/admin_access.log # 指定Envoy管理接口的访问日志保存路径。
address: # 配置管理接口的监听地址和端口。这里使用`0.0.0.0`表示监听所有网络接口,`port_value: 9901`是管理接口的端口。
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources: # 定义了不会在运行时更改的资源,比如监听器(listeners)和集群(clusters)。
listeners:
- name: listener_0 # 一个监听器配置,监听所有接口的80端口,用于HTTP流量。
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager # 这是一个Envoy的网络过滤器,用于管理HTTP连接和路由。
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http # 流量统计前缀。
codec_type: AUTO # HTTP编解码器类型,`AUTO`表示自动选择。
route_config: # 路由配置,包括虚拟主机和路由规则。
name: local_route
virtual_hosts:
- name: webservice # 在此配置中,所有域(`domains: ["*"]`)的根路径(`prefix: "/"`)都会被路由到名为`web_cluster_01`的集群。
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: web_cluster_01 }
http_filters: # HTTP过滤器链,这里使用了基本的路由器过滤器来处理路由决策。
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: web_cluster_01 # 一个集群配置,指定如何连接到服务。
connect_timeout: 0.25s # 连接超时设置。
type: STRICT_DNS # 解析策略,`STRICT_DNS`表示基于DNS严格解析。
lb_policy: ROUND_ROBIN # 负载均衡策略,这里使用`ROUND_ROBIN`表示轮询。
load_assignment: # 指定集群的负载分配和端点。这里端点通过DNS名`myservice`在80端口上进行连接。
cluster_name: web_cluster_01
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: myservice, port_value: 80 }
health_checks: # 健康检查配置,定期检查服务的健康状态,这里使用HTTP健康检查。
- timeout: 5s
interval: 10s
unhealthy_threshold: 2
healthy_threshold: 2
http_health_check:
path: /livez
expected_statuses:
start: 200
end: 399
# cat envoy-sidecar-proxy.yaml 这个YAML文件是Envoy代理的配置文件,定义了Envoy如何管理和路由网络流量。该文件包含两个主要部分:`admin`和`static_resources`。
admin:
profile_path: /tmp/envoy.prof # 指定性能分析文件的存储路径,用于记录性能相关的数据。
access_log_path: /tmp/admin_access.log # 指定管理接口访问日志的存储路径,记录对管理接口的所有访问。
address: # 定义管理接口的监听地址。`0.0.0.0`表示监听所有网络接口,而`port_value: 9901`是监听的端口,使得管理接口可以从任何地址访问。
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners: # 配置一个名为`listener_0`的监听器,监听所有接口的80端口。
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager # 是一个网络过滤器,负责管理HTTP连接和路由。
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http # 指定统计数据的前缀。
codec_type: AUTO # 设置HTTP编解码器,`AUTO`自动选择编解码器。
route_config: # 定义路由配置,其中包括虚拟主机和路由规则。此处路由配置对所有域(`"*"`)的根路径(`"/"`)的请求路由到名为`local_cluster`的集群。
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: local_cluster }
http_filters: # 定义HTTP过滤器链,这里包括一个路由器过滤器,负责执行路由决策。
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: local_cluster
connect_timeout: 0.25s # 连接超时时间设置为0.25秒。
type: STATIC # 集群类型为`STATIC`,表示集群的服务节点是静态配置的。
lb_policy: ROUND_ROBIN # 负载均衡策略为`ROUND_ROBIN`,即轮询方式。
load_assignment: # 指定集群服务节点的分配,这里配置的节点是本地的`127.0.0.1`地址,端口`8080`。
cluster_name: local_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: 127.0.0.1, port_value: 8080 }
环境说明:
五个Service:
- envoy:Front Proxy,地址为172.29.1.2
- webserver01:第一个后端服务
- webserver01-sidecar:第一个后端服务的Sidecar Proxy,地址为172.29.1.3
- webserver02:第二个后端服务
- webserver02-sidecar:第二个后端服务的Sidecar Proxy,地址为172.29.1.4
运行和测试:
- 创建
docker-compose up -d
- 测试
# 持续请求服务上的特定路径/livez
while true; do curl 172.29.1.2; sleep 1; done
# 等服务调度就绪后,另启一个终端,修改其中任何一个服务的/livez响应为非"OK"值,例如,修改第一个后端端点;
curl -X POST -d 'livez=FAIL' http://172.29.1.3/livez
# 通过请求的响应结果即可观测服务调度及响应的记录
# 请求中,可以看出第一个端点因主动健康状态检测失败,因而会被自动移出集群,直到其再次转为健康为止;
# 我们可使用类似如下命令修改为正常响应结果;
curl -X POST -d 'livez=OK' http://172.29.1.3/livez
- 停止后清理
docker-compose down
执行输出:
# docker-compose up -d
[+] Running 6/6
✔ Network envoy_cluster_health_checks_envoymesh Created 0.1s
✔ Container envoy_cluster_health_checks-webserver01-sidecar-1 Created 0.0s
✔ Container envoy_cluster_health_checks-webserver02-sidecar-1 Created 0.0s
✔ Container envoy_cluster_health_checks-webserver02-1 Created 0.0s
✔ Container envoy_cluster_health_checks-webserver01-1 Created 0.0s
✔ Container envoy_cluster_health_checks-envoy-1 Created 0.0s
Attaching to envoy-1, webserver01-1, webserver01-sidecar-1, webserver02-1, webserver02-sidecar-1
重新打开一个终端进行访问
# curl http://172.29.1.2
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
# curl http://172.29.1.2
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
查看listeners
# curl http://172.29.1.2:9901/listeners
listener_0::0.0.0.0:80
查看clusters
# curl http://172.29.1.2:9901/clusters
web_cluster_01::observability_name::web_cluster_01
web_cluster_01::default_priority::max_connections::1024
web_cluster_01::default_priority::max_pending_requests::1024
web_cluster_01::default_priority::max_requests::1024
web_cluster_01::default_priority::max_retries::3
web_cluster_01::high_priority::max_connections::1024
web_cluster_01::high_priority::max_pending_requests::1024
web_cluster_01::high_priority::max_requests::1024
web_cluster_01::high_priority::max_retries::3
web_cluster_01::added_via_api::false
web_cluster_01::172.29.1.3:80::cx_active::1
web_cluster_01::172.29.1.3:80::cx_connect_fail::0
web_cluster_01::172.29.1.3:80::cx_total::1
web_cluster_01::172.29.1.3:80::rq_active::0
web_cluster_01::172.29.1.3:80::rq_error::0
web_cluster_01::172.29.1.3:80::rq_success::1
web_cluster_01::172.29.1.3:80::rq_timeout::0
web_cluster_01::172.29.1.3:80::rq_total::1
web_cluster_01::172.29.1.3:80::hostname::myservice
web_cluster_01::172.29.1.3:80::health_flags::healthy
web_cluster_01::172.29.1.3:80::weight::1
web_cluster_01::172.29.1.3:80::region::
web_cluster_01::172.29.1.3:80::zone::
web_cluster_01::172.29.1.3:80::sub_zone::
web_cluster_01::172.29.1.3:80::canary::false
web_cluster_01::172.29.1.3:80::priority::0
web_cluster_01::172.29.1.3:80::success_rate::-1
web_cluster_01::172.29.1.3:80::local_origin_success_rate::-1
web_cluster_01::172.29.1.4:80::cx_active::1
web_cluster_01::172.29.1.4:80::cx_connect_fail::0
web_cluster_01::172.29.1.4:80::cx_total::1
web_cluster_01::172.29.1.4:80::rq_active::0
web_cluster_01::172.29.1.4:80::rq_error::0
web_cluster_01::172.29.1.4:80::rq_success::1
web_cluster_01::172.29.1.4:80::rq_timeout::0
web_cluster_01::172.29.1.4:80::rq_total::1
web_cluster_01::172.29.1.4:80::hostname::myservice
web_cluster_01::172.29.1.4:80::health_flags::healthy
web_cluster_01::172.29.1.4:80::weight::1
web_cluster_01::172.29.1.4:80::region::
web_cluster_01::172.29.1.4:80::zone::
web_cluster_01::172.29.1.4:80::sub_zone::
web_cluster_01::172.29.1.4:80::canary::false
web_cluster_01::172.29.1.4:80::priority::0
web_cluster_01::172.29.1.4:80::success_rate::-1
web_cluster_01::172.29.1.4:80::local_origin_success_rate::-1
访问livez,确认状态为OK
# curl http://172.29.1.2/livez
OK
使用while循环多次访问查看状态,所有的上游主机都在
# while true; do curl 172.29.1.2; sleep 1; done
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
为指定主机设置livez=FAIL后再访问
# curl -X POST -d 'livez=FAIL' http://172.29.1.3/livez
使用while循环多次访问,可以看到上述指定主机已不存在:
# while true; do curl 172.29.1.2; sleep 1; done
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
恢复该主机
# curl -X POST -d 'livez=OK' http://172.29.1.3/livez
# while true; do curl 172.29.1.2; sleep 1; done
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
上述web服务代码实现
[root@blue /usr/local/bin]# cat demo.py
#!/usr/bin/python3
#
from flask import Flask, request, abort, Response, jsonify as flask_jsonify, make_response
import argparse
import sys, os, getopt, socket, json, time
app = Flask(__name__)
@app.route('/')
def index():
return ('demoapp v1.0 !! ClientIP: {}, ServerName: {}, '
'ServerIP: {}!\n'.format(request.remote_addr, socket.gethostname(),
socket.gethostbyname(socket.gethostname())))
@app.route('/hostname')
def hostname():
return ('ServerName: {}\n'.format(socket.gethostname()))
health_status = {'livez': 'OK', 'readyz': 'OK'}
probe_count = {'livez': 0, 'readyz': 0}
@app.route('/livez', methods=['GET','POST'])
def livez():
if request.method == 'POST':
status = request.form['livez']
health_status['livez'] = status
return ''
else:
if probe_count['livez'] == 0:
time.sleep(5)
probe_count['livez'] += 1
if health_status['livez'] == 'OK':
return make_response((health_status['livez']), 200)
else:
return make_response((health_status['livez']), 506)
@app.route('/readyz', methods=['GET','POST'])
def readyz():
if request.method == 'POST':
status = request.form['readyz']
health_status['readyz'] = status
return ''
else:
if probe_count['readyz'] == 0:
time.sleep(15)
probe_count['readyz'] += 1
if health_status['readyz'] == 'OK':
return make_response((health_status['readyz']), 200)
else:
return make_response((health_status['readyz']), 507)
@app.route('/configs')
def configs():
return ('DEPLOYENV: {}\nRELEASE: {}\n'.format(os.environ.get('DEPLOYENV'), os.environ.get('RELEASE')))
@app.route("/user-agent")
def view_user_agent():
# user_agent=request.headers.get('User-Agent')
return('User-Agent: {}\n'.format(request.headers.get('user-agent')))
def main(argv):
port = 80
host = '0.0.0.0'
debug = False
if os.environ.get('PORT') is not None:
port = os.environ.get('PORT')
if os.environ.get('HOST') is not None:
host = os.environ.get('HOST')
try:
opts, args = getopt.getopt(argv,"vh:p:",["verbose","host=","port="])
except getopt.GetoptError:
print('server.py -p <portnumber>')
sys.exit(2)
for opt, arg in opts:
if opt in ("-p", "--port"):
port = arg
elif opt in ("-h", "--host"):
host = arg
elif opt in ("-v", "--verbose"):
debug = True
app.run(host=str(host), port=int(port), debug=bool(debug))
if __name__ == "__main__":
main(sys.argv[1:])
4.2 基于tcp协议主动健康检测
由于web应用前端有envoy代理,所以本案例验证时选择直接关闭envoy代理。
# mkdir envoy_cluster_health_checks_tcp
# cd envoy_cluster_health_checks_tcp
# cat docker-compose.yaml
services:
envoy:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./front-envoy-with-tcp-check.yaml:/etc/envoy/envoy.yaml
networks:
envoymesh:
ipv4_address: 172.30.1.2
aliases:
- front-proxy
depends_on:
- webserver01-sidecar
- webserver02-sidecar
webserver01-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: blue
networks:
envoymesh:
ipv4_address: 172.30.1.3
aliases:
- myservice
webserver01:
image: docker.17ker.top/envoy/demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver01-sidecar"
depends_on:
- webserver01-sidecar
webserver02-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: yellow
networks:
envoymesh:
ipv4_address: 172.30.1.4
aliases:
- myservice
webserver02:
image: docker.17ker.top/envoy/demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver02-sidecar"
depends_on:
- webserver02-sidecar
networks:
envoymesh:
driver: bridge
ipam:
config:
- subnet: 172.30.1.0/24
# cat front-envoy-with-tcp-check.yaml
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: webservice
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: web_cluster_01 }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: web_cluster_01
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: web_cluster_01
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: myservice, port_value: 80 }
health_checks:
- timeout: 5s
interval: 10s
unhealthy_threshold: 2
healthy_threshold: 2
tcp_health_check: {}
# cat envoy-sidecar-proxy.yaml
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: local_cluster }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: local_cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: local_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: 127.0.0.1, port_value: 8080 }
在终端1中运行
# docker-compose up -d
在终端2中查看状态
# curl http://172.30.1.2:9901/stats | grep health_check
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 20281 0 20281 0 0 17.0M 0 --:--:-- --:--:-- --:--:-- 19.3M
cluster.web_cluster_01.health_check.attempt: 8
cluster.web_cluster_01.health_check.degraded: 0
cluster.web_cluster_01.health_check.failure: 0
cluster.web_cluster_01.health_check.healthy: 2
cluster.web_cluster_01.health_check.network_failure: 0
cluster.web_cluster_01.health_check.passive_failure: 0
cluster.web_cluster_01.health_check.success: 8
cluster.web_cluster_01.health_check.verify_cluster: 0
http.ingress_http.tracing.health_check: 0
在终端2中执行
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1f2d190db5d envoyproxy/envoy:v1.30.1 "/docker-entrypoint.…" 3 minutes ago Up 3 minutes 10000/tcp envoy_cluster_health_checks_tcp-envoy-1
eba5e2d21e26 docker.17ker.top/envoy/demoapp:v1.0 "/bin/sh -c 'python3…" 3 minutes ago Up 3 minutes envoy_cluster_health_checks_tcp-webserver01-1
a14fac3a0265 docker.17ker.top/envoy/demoapp:v1.0 "/bin/sh -c 'python3…" 3 minutes ago Up 3 minutes envoy_cluster_health_checks_tcp-webserver02-1
0cd68453fa48 envoyproxy/envoy:v1.30.1 "/docker-entrypoint.…" 3 minutes ago Up 3 minutes 10000/tcp envoy_cluster_health_checks_tcp-webserver02-sidecar-1
cad933da773d envoyproxy/envoy:v1.30.1 "/docker-entrypoint.…" 3 minutes ago Up 3 minutes 10000/tcp envoy_cluster_health_checks_tcp-webserver01-sidecar-1
在终端2中执行
# docker stop envoy_cluster_health_checks_tcp-webserver01-sidecar-1
envoy_cluster_health_checks_tcp-webserver01-sidecar-1
在终端2中执行
# curl http://172.30.1.2:9901/stats | grep health_check
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 20302 0 20302 0 0 20.8M 0 --:--:-- --:--:-- --:--:-- 19.3M
cluster.web_cluster_01.health_check.attempt: 12
cluster.web_cluster_01.health_check.degraded: 0
cluster.web_cluster_01.health_check.failure: 1
cluster.web_cluster_01.health_check.healthy: 2
cluster.web_cluster_01.health_check.network_failure: 1
cluster.web_cluster_01.health_check.passive_failure: 0
cluster.web_cluster_01.health_check.success: 11
cluster.web_cluster_01.health_check.verify_cluster: 0
http.ingress_http.tracing.health_check:
评论区