一、熔断介绍
1.1 什么是熔断
在Envoy中,熔断(Circuit Breaking)是一种保护机制,用于防止上游服务(upstream services)过载,从而确保系统的整体稳定性和可用性。熔断器会在检测到上游服务异常负载或故障率过高时,自动限制或中断对该服务的请求,以防止故障蔓延和资源耗尽。
1.2 熔断的工作原理
熔断器的工作原理类似于电路中的保险丝,当检测到异常情况(如请求失败率高、响应时间长等)时,熔断器会“跳闸”,暂时停止向故障的上游服务发送请求。在一段时间后,熔断器会尝试恢复对上游服务的请求,如果服务恢复正常,则重新开启流量,否则继续保持熔断状态。
1.3 配置熔断器
在Envoy中,熔断器的配置可以在集群级别进行。配置项包括最大连接数、最大并发请求数、最大请求数、最大重试次数等。以下是配置熔断器的示例:
示例配置
static_resources:
clusters:
- name: example_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: example_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: example.com
port_value: 80
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100
max_pending_requests: 1000
max_requests: 5000
max_retries: 3
- priority: HIGH
max_connections: 200
max_pending_requests: 2000
max_requests: 10000
max_retries: 5
配置项解释
- priority:定义优先级,Envoy支持两种优先级:
DEFAULT和HIGH。可以为不同的优先级配置不同的熔断策略。- 默认流量(DEFAULT):适用于大多数普通请求,保证在常规负载下的服务稳定性。
- 高优先级流量(HIGH):适用于关键任务或实时请求,提供更高的资源限制,确保这些关键请求在高负载情况下仍然能够得到处理。
- max_connections:允许的最大连接数。当达到此限制时,新的连接请求将被拒绝。
- max_pending_requests:允许的最大挂起请求数。当达到此限制时,新的请求将被拒绝。
- max_requests:允许的最大请求数。当达到此限制时,新的请求将被拒绝。
- max_retries:允许的最大重试次数。当达到此限制时,将不会再尝试重试失败的请求。
1.4 熔断的优势
- 防止服务过载:通过限制请求和连接数,防止上游服务因过载而崩溃。
- 提高系统稳定性:避免故障蔓延,确保其他服务的正常运行。
- 自动恢复:熔断器在一段时间后会尝试恢复对上游服务的请求,确保服务恢复正常后重新提供服务。
1.5 监控和调试
Envoy提供了丰富的监控指标,可以通过/admin接口查看熔断器的状态和统计信息。例如:
cluster.<cluster_name>.circuit_breakers.default.cx_open:当前打开的连接数。cluster.<cluster_name>.circuit_breakers.default.rq_pending_open:当前打开的挂起请求数。cluster.<cluster_name>.circuit_breakers.default.rq_open:当前打开的请求数。cluster.<cluster_name>.circuit_breakers.default.rq_retry_open:当前打开的重试请求数。
这些统计数据可以通过访问Envoy的admin接口获取,例如 http://localhost:9901/stats。
熔断器是Envoy中一项关键的保护机制,通过限制上游服务的请求和连接数,防止服务过载和故障蔓延。通过配置熔断器,可以提高系统的稳定性和可靠性,确保在上游服务发生异常时,整个系统仍能正常运行。
1.6 连接池
在Envoy中,集群连接池(Cluster Connection Pool)是一个管理与上游服务(upstream services)之间连接的组件。连接池用于重用现有连接,以减少连接建立和拆除的开销,从而提高性能和效率。连接池可以配置为不同类型的协议,如HTTP1、HTTP2和TCP。
1.6.1 连接池的作用
- 减少连接开销:通过重用现有连接,减少连接建立和拆除的频率,降低延迟和资源消耗。
- 提高吞吐量:连接池可以同时维护多个连接,提升并发处理能力。
- 优化资源使用:通过限制连接数,防止资源过度使用和服务过载。
1.6.2 配置连接池
连接池配置通常在集群配置中指定。下面是一些常见的连接池配置示例,包括HTTP1、HTTP2和TCP连接池的配置。
HTTP1 连接池配置
static_resources:
clusters:
- name: http1_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: http1_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: http1-service.example.com
port_value: 80
http_protocol_options:
max_connections: 100
max_pending_requests: 1000
max_requests_per_connection: 50
idle_timeout: 1s
HTTP2 连接池配置
static_resources:
clusters:
- name: http2_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: http2_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: http2-service.example.com
port_value: 80
http2_protocol_options: {} # 启用HTTP2
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100
max_pending_requests: 1000
max_requests: 5000
TCP 连接池配置
static_resources:
clusters:
- name: tcp_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: tcp_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: tcp-service.example.com
port_value: 9000
upstream_connection_options:
tcp_keepalive:
keepalive_time: 300
keepalive_interval: 60
keepalive_probes: 5
配置项解释
- max_connections:连接池中允许的最大连接数。
- max_pending_requests:连接池中允许的最大挂起请求数。
- max_requests_per_connection:每个连接允许处理的最大请求数。适用于HTTP1连接池。
- idle_timeout:连接的空闲超时时间。如果连接在此时间内没有活动,会被关闭。
- http2_protocol_options:启用HTTP2协议的连接池配置。
- tcp_keepalive:TCP连接的keep-alive选项,包括保持时间、间隔和探测次数。
二、 熔断案例
环境说明
五个Service:
- envoy:Front Proxy,地址为172.31.9.2
- webserver01:第一个后端服务
- webserver01-sidecar:第一个后端服务的Sidecar Proxy,地址为172.31.9.3, 别名为red和webservice1
- webserver02:第二个后端服务
- webserver02-sidecar:第一个后端服务的Sidecar Proxy,地址为172.31.9.4, 别名为blue和webservice1
- webserver03:第三个后端服务
- webserver03-sidecar:第一个后端服务的Sidecar Proxy,地址为172.31.9.5, 别名为green和webservice1
- webserver04:第四个后端服务
- webserver04-sidecar:第四个后端服务的Sidecar Proxy,地址为172.31.9.6, 别名为gray和webservice2
- webserver05:第五个后端服务
- webserver05-sidecar:第五个后端服务的Sidecar Proxy,地址为172.31.9.7, 别名为black和webservice2
2.1 启动配置和测试脚本
# cat docker-compose.yaml
services:
front-envoy:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./front-envoy.yaml:/etc/envoy/envoy.yaml
networks:
envoymesh:
ipv4_address: 172.31.9.2
aliases:
- front-proxy
expose:
# Expose ports 80 (for general traffic) and 9901 (for the admin server)
- "80"
- "9901"
webserver01-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: red
networks:
envoymesh:
ipv4_address: 172.31.9.3
aliases:
- webservice1
- red
webserver01:
image: demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver01-sidecar"
depends_on:
- webserver01-sidecar
webserver02-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: blue
networks:
envoymesh:
ipv4_address: 172.31.9.4
aliases:
- webservice1
- blue
webserver02:
image: demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver02-sidecar"
depends_on:
- webserver02-sidecar
webserver03-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: green
networks:
envoymesh:
ipv4_address: 172.31.9.5
aliases:
- webservice1
- green
webserver03:
image: demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver03-sidecar"
depends_on:
- webserver03-sidecar
webserver04-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: gray
networks:
envoymesh:
ipv4_address: 172.31.9.6
aliases:
- webservice2
- gray
webserver04:
image: demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver04-sidecar"
depends_on:
- webserver04-sidecar
webserver05-sidecar:
image: envoyproxy/envoy:v1.30.1
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
hostname: black
networks:
envoymesh:
ipv4_address: 172.31.9.7
aliases:
- webservice2
- black
webserver05:
image: demoapp:v1.0
environment:
- ENVOY_UID=0
- ENVOY_GID=0
- PORT=8080
- HOST=127.0.0.1
network_mode: "service:webserver05-sidecar"
depends_on:
- webserver05-sidecar
networks:
envoymesh:
driver: bridge
ipam:
config:
- subnet: 172.31.9.0/24
# cat envoy-sidecar-proxy.yaml
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: local_cluster }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: local_cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: local_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: 127.0.0.1, port_value: 8080 }
circuit_breakers:
thresholds:
max_connections: 1
max_pending_requests: 1
max_retries: 2
# cat front-envoy.yaml
admin:
access_log_path: "/dev/null"
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- address:
socket_address: { address: 0.0.0.0, port_value: 80 }
name: listener_http
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/livez" # 匹配前缀`/livez`,转发到`webcluster2`
route:
cluster: webcluster2
- match:
prefix: "/" # 匹配前缀`/`,转发到`webcluster1`
route:
cluster: webcluster1
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: webcluster1
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: webcluster1
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: webservice1
port_value: 80
circuit_breakers: # 断路器配置
thresholds: # 配置阈值
max_connections: 1 # 最大连接数为1
max_pending_requests: 1 # 最大挂起请求数为1
max_retries: 3 # 最大重试次数为3
- name: webcluster2
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: webcluster2
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: webservice2
port_value: 80
outlier_detection: # 异常检测配置
interval: "1s" # 检测间隔为 1s
consecutive_5xx: "3" # 连续5xx错误数为 3
consecutive_gateway_failure: "3" # 连续网关失败数为 3
base_ejection_time: "10s" # 基本驱逐时间为 10s
enforcing_consecutive_gateway_failure: "100" # 强制执行连续网关失败百分比为 100
max_ejection_percent: "30" # 最大驱逐百分比为 30
success_rate_minimum_hosts: "2" # 最小成功率主机数为 2
# cat send-requests.sh
#!/bin/bash
#
if [ $# -ne 2 ]
then
echo "USAGE: $0 <URL> <COUNT>"
exit 1;
fi
URL=$1
COUNT=$2
c=1
#interval="0.2"
while [[ ${c} -le ${COUNT} ]];
do
#echo "Sending GET request: ${URL}"
curl -o /dev/null -w '%{http_code}\n' -s ${URL} &
(( c++ ))
# sleep $interval
done
wait
2.2 运行和测试
1. 创建
docker-compose up -d
2. 测试
# 通过send-requests.sh脚本进行webcluster1的请求测试,可发现,有部分请求的响应码为5xx,这其实就是被熔断的处理结果;
./send-requests.sh http://172.31.9.2/ 300
3. 停止后清理环境
docker-compose down
# ./send-requests.sh http://172.31.9.2/ 500
200
200
200
503 # 被熔断
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
503
200
# curl -s http://172.31.9.2:9901/clusters | grep rq_
webcluster1::172.31.9.4:80::rq_active::0
webcluster1::172.31.9.4:80::rq_error::4 # 被熔断
webcluster1::172.31.9.4:80::rq_success::263
webcluster1::172.31.9.4:80::rq_timeout::0
webcluster1::172.31.9.4:80::rq_total::267
webcluster1::172.31.9.5:80::rq_active::0
webcluster1::172.31.9.5:80::rq_error::5
webcluster1::172.31.9.5:80::rq_success::259
webcluster1::172.31.9.5:80::rq_timeout::0
webcluster1::172.31.9.5:80::rq_total::264
webcluster1::172.31.9.3:80::rq_active::0
webcluster1::172.31.9.3:80::rq_error::5
webcluster1::172.31.9.3:80::rq_success::262
webcluster1::172.31.9.3:80::rq_timeout::0
webcluster1::172.31.9.3:80::rq_total::267
webcluster2::172.31.9.6:80::rq_active::0
webcluster2::172.31.9.6:80::rq_error::0
webcluster2::172.31.9.6:80::rq_success::0
webcluster2::172.31.9.6:80::rq_timeout::0
webcluster2::172.31.9.6:80::rq_total::0
webcluster2::172.31.9.7:80::rq_active::0
webcluster2::172.31.9.7:80::rq_error::0
webcluster2::172.31.9.7:80::rq_success::0
webcluster2::172.31.9.7:80::rq_timeout::0
webcluster2::172.31.9.7:80::rq_total::0
评论区