侧边栏壁纸
  • 累计撰写 28 篇文章
  • 累计创建 23 个标签
  • 累计收到 0 条评论

目 录CONTENT

文章目录

Envoy 集群管理 Upstream Cluster健康状态检测 主动健康状态检测

zhanjie.me
2025-09-15 / 0 评论 / 0 点赞 / 5 阅读 / 0 字

一、分布式系统中的一致性模型

在分布式系统中,一致性模型定义了多个副本之间的数据同步程度。主要有两种模型:

  1. 强一致性:所有副本在任何时间点上都保证一致的状态。
  2. 最终一致性:允许短期的不一致,但在没有新的更新操作后,系统最终会达到一致的状态。

二、 Envoy的服务发现和健康检查

Envoy的服务发现机制是基于最终一致性模型的。这意味着在某一时刻,Envoy实例可能会对上游服务的成员列表有不同的看法,但随着时间推移,这些视图会趋于一致。
服务发现的最终一致性
Envoy通过xDS(如CDS、EDS)协议与控制平面(如Istio、Consul)通信来获取服务的最新信息。在实际操作中:

  1. 订阅和推送:Envoy实例订阅控制平面发布的服务信息更新。当服务实例加入或离开网格时,控制平面会将这些变化推送给订阅的Envoy实例。
  2. 传播延迟:由于网络延迟、处理时间等因素,不同Envoy实例接收到更新的时间点可能不同。因此,在短期内,Envoy实例之间的服务信息可能不一致。
  3. 最终一致性:随着控制平面不断推送更新,并且所有Envoy实例定期刷新服务信息,所有实例最终会达到一致的视图。

主动健康检查
为了确保服务的可靠性,Envoy结合了主动健康检查机制来判定集群的健康状态:

  1. 健康检查类型:Envoy支持多种健康检查方式,包括HTTP、TCP和GRPC健康检查。
  2. 周期性检查:Envoy定期向上游服务实例发送健康检查请求,以确定它们是否健康。
  3. 状态报告:根据健康检查的结果,Envoy可以动态调整负载均衡策略,例如只将流量发送到健康的实例,避免不健康的实例。

具体场景理解
假设在一个服务网格中,有多个Envoy实例和一个控制平面。控制平面管理着一个名为 example_service 的服务,该服务有多个实例。

  1. 实例加入网格:当新的服务实例 example_service_3 加入网格时,控制平面会更新其服务信息并推送给所有订阅的Envoy实例。
  2. 传播和更新:由于传播延迟,不同Envoy实例接收到该更新的时间点不同。一段时间内,某些Envoy实例可能还不知道 example_service_3 的存在。
  3. 最终一致:随着时间推移,所有Envoy实例都会收到该更新并更新其内部状态,最终达到一致。
  4. 健康检查:在这一过程中,Envoy实例会持续对 example_service 的所有实例进行健康检查。如果某个实例(如 example_service_2)变得不健康,Envoy会将其标记为不健康,并停止将流量路由到该实例,直到其恢复健康状态。

Envoy的服务发现采用最终一致性模型,而不是强一致性模型。这意味着它允许短期的不一致,但最终会达到一致的状态。通过结合主动健康检查机制,Envoy能够确保尽可能多地将流量路由到健康的上游服务实例,从而提高整个系统的可靠性和稳定性。这种设计既保证了系统的灵活性和扩展性,又通过健康检查机制维护了服务的可用性。

三、主动健康检测类型及示例

在Envoy中,主动健康检查(Active Health Checking)是一种机制,用于定期向上游服务实例发送健康检查请求,以确定它们是否可以正常处理请求。通过主动健康检查,Envoy可以确保仅将流量路由到健康的实例,从而提高服务的可靠性和可用性。

3.1 健康检查类型

Envoy支持多种健康检查类型,包括:

  1. HTTP/HTTPS 健康检查:发送HTTP/HTTPS请求并检查响应状态码。
  2. TCP 健康检查:通过TCP连接建立是否成功来判断健康状态。
  3. gRPC 健康检查:发送gRPC请求并检查响应状态。

3.2 健康检查配置示例

以下是如何在Envoy中配置主动健康检查的示例,包括HTTP、TCP和gRPC健康检查。

3.2.1 HTTP 健康检查

static_resources:
  clusters:
  - name: http_service_cluster
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: http_service_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: http_service.example.com
                port_value: 80
    health_checks:
    - timeout: 1s
      interval: 10s
      unhealthy_threshold: 3
      healthy_threshold: 2
      http_health_check:
        path: /health
        expected_statuses:
          - start: 200
            end: 200

在这个示例中,Envoy会每10秒向 http_service.example.com 发送一个HTTP请求,请求路径为 /health。如果连续2次返回200状态码,Envoy会将该实例标记为健康;如果连续3次没有返回200状态码,则标记为不健康。

3.2.2 TCP 健康检查

static_resources:
  clusters:
  - name: tcp_service_cluster
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: tcp_service_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: tcp_service.example.com
                port_value: 9000
    health_checks:
    - timeout: 1s
      interval: 10s
      unhealthy_threshold: 3
      healthy_threshold: 2
      tcp_health_check: {}

在这个示例中,Envoy会每10秒尝试与 tcp_service.example.com 的9000端口建立TCP连接。如果连续2次连接成功,则标记为健康;如果连续3次连接失败,则标记为不健康。

3.2.3 gRPC 健康检查

static_resources:
  clusters:
  - name: grpc_service_cluster
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: grpc_service_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: grpc_service.example.com
                port_value: 50051
    health_checks:
    - timeout: 1s
      interval: 10s
      unhealthy_threshold: 3
      healthy_threshold: 2
      grpc_health_check:
        service_name: "my_service"

在这个示例中,Envoy会每10秒向 grpc_service.example.com 的50051端口发送一个gRPC健康检查请求,服务名为 my_service 。如果连续2次检查成功,则标记为健康;如果连续3次检查失败,则标记为不健康。

关键配置项解释

  • timeout: 每次健康检查的超时时间。
  • interval: 健康检查的时间间隔。
  • unhealthy_threshold: 连续检查失败次数超过该值时,实例被标记为不健康。
  • healthy_threshold: 连续检查成功次数超过该值时,实例被标记为健康。
  • http_health_check: HTTP健康检查的具体配置,包括检查路径和期望的状态码范围。
  • tcp_health_check: TCP健康检查的配置,通常为空。
  • grpc_health_check: gRPC健康检查的具体配置,包括服务名。

3.2.4 监控和调试

Envoy提供了丰富的监控和调试工具,可以通过/admin接口查看健康检查的状态和结果。例如,访问 http://localhost:9901/stats 可以查看健康检查的统计信息。
通过主动健康检查,Envoy可以动态监控上游服务实例的健康状态,并根据检查结果调整流量路由。这种机制有助于提高服务的可靠性,确保只有健康的实例接收请求,避免因实例故障导致的服务不可用。

四、 主动健康检测案例

4.1 基于http协议主动健康检测

image-obhsvbxo.png

[root@dockerhost-envoy ~]# mkdir envoy_cluster_health_checks
[root@dockerhost-envoy ~]# cd envoy_cluster_health_checks
# cat docker-compose.yaml
 services:
   envoy:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./front-envoy.yaml:/etc/envoy/envoy.yaml
     networks:
       envoymesh:
         ipv4_address: 172.29.1.2
         aliases:
         - front-proxy
     depends_on:
     - webserver01-sidecar
     - webserver02-sidecar
 
   webserver01-sidecar:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
     hostname: blue
     networks:
       envoymesh:
         ipv4_address: 172.29.1.3
         aliases:
         - myservice
 
   webserver01:
     image: docker.17ker.top/envoy/demoapp:v1.0
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
       - PORT=8080
       - HOST=127.0.0.1
     network_mode: "service:webserver01-sidecar"
     depends_on:
     - webserver01-sidecar
 
   webserver02-sidecar:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
     hostname: yellow
     networks:
       envoymesh:
         ipv4_address: 172.29.1.4
         aliases:
         - myservice
 
   webserver02:
     image: docker.17ker.top/envoy/demoapp:v1.0
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
       - PORT=8080
       - HOST=127.0.0.1
     network_mode: "service:webserver02-sidecar"
     depends_on:
     - webserver02-sidecar
 
 networks:
   envoymesh:
     driver: bridge
     ipam:
       config:
         - subnet: 172.29.1.0/24
# cat front-envoy.yaml
admin:
  profile_path: /tmp/envoy.prof                                        # 指定Envoy性能分析数据的保存路径。
  access_log_path: /tmp/admin_access.log                               # 指定Envoy管理接口的访问日志保存路径。
  address:                                                             # 配置管理接口的监听地址和端口。这里使用`0.0.0.0`表示监听所有网络接口,`port_value: 9901`是管理接口的端口。
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:                                                      # 定义了不会在运行时更改的资源,比如监听器(listeners)和集群(clusters)。
  listeners:
  - name: listener_0                                                   # 一个监听器配置,监听所有接口的80端口,用于HTTP流量。
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager            # 这是一个Envoy的网络过滤器,用于管理HTTP连接和路由。
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http                                    # 流量统计前缀。
          codec_type: AUTO                                             # HTTP编解码器类型,`AUTO`表示自动选择。
          route_config:                                                # 路由配置,包括虚拟主机和路由规则。
            name: local_route
            virtual_hosts:
            - name: webservice                  # 在此配置中,所有域(`domains: ["*"]`)的根路径(`prefix: "/"`)都会被路由到名为`web_cluster_01`的集群。
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: web_cluster_01 }
          http_filters:                         # HTTP过滤器链,这里使用了基本的路由器过滤器来处理路由决策。
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
    
  clusters:
  - name: web_cluster_01                        # 一个集群配置,指定如何连接到服务。
    connect_timeout: 0.25s                      # 连接超时设置。
    type: STRICT_DNS                            # 解析策略,`STRICT_DNS`表示基于DNS严格解析。
    lb_policy: ROUND_ROBIN                      # 负载均衡策略,这里使用`ROUND_ROBIN`表示轮询。
    load_assignment:                            # 指定集群的负载分配和端点。这里端点通过DNS名`myservice`在80端口上进行连接。
      cluster_name: web_cluster_01
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address: { address: myservice, port_value: 80 }
    health_checks:                              # 健康检查配置,定期检查服务的健康状态,这里使用HTTP健康检查。
    - timeout: 5s
      interval: 10s
      unhealthy_threshold: 2
      healthy_threshold: 2
      http_health_check:
        path: /livez
        expected_statuses:
          start: 200
          end: 399
# cat envoy-sidecar-proxy.yaml        这个YAML文件是Envoy代理的配置文件,定义了Envoy如何管理和路由网络流量。该文件包含两个主要部分:`admin`和`static_resources`。
admin:
  profile_path: /tmp/envoy.prof                   # 指定性能分析文件的存储路径,用于记录性能相关的数据。
  access_log_path: /tmp/admin_access.log          # 指定管理接口访问日志的存储路径,记录对管理接口的所有访问。
  address:                                        # 定义管理接口的监听地址。`0.0.0.0`表示监听所有网络接口,而`port_value: 9901`是监听的端口,使得管理接口可以从任何地址访问。
    socket_address:
       address: 0.0.0.0
       port_value: 9901

static_resources:
  listeners:                                      # 配置一个名为`listener_0`的监听器,监听所有接口的80端口。
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 80 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager     # 是一个网络过滤器,负责管理HTTP连接和路由。
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http                             # 指定统计数据的前缀。
          codec_type: AUTO                                      # 设置HTTP编解码器,`AUTO`自动选择编解码器。
          route_config:                                         # 定义路由配置,其中包括虚拟主机和路由规则。此处路由配置对所有域(`"*"`)的根路径(`"/"`)的请求路由到名为`local_cluster`的集群。
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: local_cluster }
          http_filters:                                         # 定义HTTP过滤器链,这里包括一个路由器过滤器,负责执行路由决策。
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: local_cluster
    connect_timeout: 0.25s                          # 连接超时时间设置为0.25秒。
    type: STATIC                                    # 集群类型为`STATIC`,表示集群的服务节点是静态配置的。
    lb_policy: ROUND_ROBIN                          # 负载均衡策略为`ROUND_ROBIN`,即轮询方式。
    load_assignment:                                # 指定集群服务节点的分配,这里配置的节点是本地的`127.0.0.1`地址,端口`8080`。
      cluster_name: local_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address: { address: 127.0.0.1, port_value: 8080 }

环境说明
五个Service:

  • envoy:Front Proxy,地址为172.29.1.2
  • webserver01:第一个后端服务
  • webserver01-sidecar:第一个后端服务的Sidecar Proxy,地址为172.29.1.3
  • webserver02:第二个后端服务
  • webserver02-sidecar:第二个后端服务的Sidecar Proxy,地址为172.29.1.4

运行和测试

  1. 创建
    docker-compose up -d
  1. 测试
# 持续请求服务上的特定路径/livez
while true; do curl 172.29.1.2; sleep 1; done

# 等服务调度就绪后,另启一个终端,修改其中任何一个服务的/livez响应为非"OK"值,例如,修改第一个后端端点;
curl -X POST -d 'livez=FAIL' http://172.29.1.3/livez

# 通过请求的响应结果即可观测服务调度及响应的记录
# 请求中,可以看出第一个端点因主动健康状态检测失败,因而会被自动移出集群,直到其再次转为健康为止;
# 我们可使用类似如下命令修改为正常响应结果;
curl -X POST -d 'livez=OK' http://172.29.1.3/livez  
  1. 停止后清理
     docker-compose down

执行输出:

# docker-compose up -d
 [+] Running 6/6
  ✔ Network envoy_cluster_health_checks_envoymesh                Created                     0.1s
  ✔ Container envoy_cluster_health_checks-webserver01-sidecar-1  Created                     0.0s
  ✔ Container envoy_cluster_health_checks-webserver02-sidecar-1  Created                     0.0s
  ✔ Container envoy_cluster_health_checks-webserver02-1          Created                     0.0s
  ✔ Container envoy_cluster_health_checks-webserver01-1          Created                     0.0s
  ✔ Container envoy_cluster_health_checks-envoy-1                Created                     0.0s
 Attaching to envoy-1, webserver01-1, webserver01-sidecar-1, webserver02-1, webserver02-sidecar-1
重新打开一个终端进行访问
 
 # curl http://172.29.1.2
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 
 # curl http://172.29.1.2
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!

查看listeners
 # curl http://172.29.1.2:9901/listeners
 listener_0::0.0.0.0:80

查看clusters
 # curl http://172.29.1.2:9901/clusters
 web_cluster_01::observability_name::web_cluster_01
 web_cluster_01::default_priority::max_connections::1024
 web_cluster_01::default_priority::max_pending_requests::1024
 web_cluster_01::default_priority::max_requests::1024
 web_cluster_01::default_priority::max_retries::3
 web_cluster_01::high_priority::max_connections::1024
 web_cluster_01::high_priority::max_pending_requests::1024
 web_cluster_01::high_priority::max_requests::1024
 web_cluster_01::high_priority::max_retries::3
 web_cluster_01::added_via_api::false
 web_cluster_01::172.29.1.3:80::cx_active::1
 web_cluster_01::172.29.1.3:80::cx_connect_fail::0
 web_cluster_01::172.29.1.3:80::cx_total::1
 web_cluster_01::172.29.1.3:80::rq_active::0
 web_cluster_01::172.29.1.3:80::rq_error::0
 web_cluster_01::172.29.1.3:80::rq_success::1
 web_cluster_01::172.29.1.3:80::rq_timeout::0
 web_cluster_01::172.29.1.3:80::rq_total::1
 web_cluster_01::172.29.1.3:80::hostname::myservice
 web_cluster_01::172.29.1.3:80::health_flags::healthy
 web_cluster_01::172.29.1.3:80::weight::1
 web_cluster_01::172.29.1.3:80::region::
 web_cluster_01::172.29.1.3:80::zone::
 web_cluster_01::172.29.1.3:80::sub_zone::
 web_cluster_01::172.29.1.3:80::canary::false
 web_cluster_01::172.29.1.3:80::priority::0
 web_cluster_01::172.29.1.3:80::success_rate::-1
 web_cluster_01::172.29.1.3:80::local_origin_success_rate::-1
 web_cluster_01::172.29.1.4:80::cx_active::1
 web_cluster_01::172.29.1.4:80::cx_connect_fail::0
 web_cluster_01::172.29.1.4:80::cx_total::1
 web_cluster_01::172.29.1.4:80::rq_active::0
 web_cluster_01::172.29.1.4:80::rq_error::0
 web_cluster_01::172.29.1.4:80::rq_success::1
 web_cluster_01::172.29.1.4:80::rq_timeout::0
 web_cluster_01::172.29.1.4:80::rq_total::1
 web_cluster_01::172.29.1.4:80::hostname::myservice
 web_cluster_01::172.29.1.4:80::health_flags::healthy
 web_cluster_01::172.29.1.4:80::weight::1
 web_cluster_01::172.29.1.4:80::region::
 web_cluster_01::172.29.1.4:80::zone::
 web_cluster_01::172.29.1.4:80::sub_zone::
 web_cluster_01::172.29.1.4:80::canary::false
 web_cluster_01::172.29.1.4:80::priority::0
 web_cluster_01::172.29.1.4:80::success_rate::-1
 web_cluster_01::172.29.1.4:80::local_origin_success_rate::-1
 访问livez,确认状态为OK
 # curl http://172.29.1.2/livez
 OK

 使用while循环多次访问查看状态,所有的上游主机都在
 # while true; do curl 172.29.1.2; sleep 1; done
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
为指定主机设置livez=FAIL后再访问
 # curl -X POST -d 'livez=FAIL' http://172.29.1.3/livez
使用while循环多次访问,可以看到上述指定主机已不存在:
 # while true; do curl 172.29.1.2; sleep 1; done
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!

恢复该主机
# curl -X POST -d 'livez=OK' http://172.29.1.3/livez
# while true; do curl 172.29.1.2; sleep 1; done
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: yellow, ServerIP: 172.29.1.4!
 demoapp v1.0 !! ClientIP: 127.0.0.1, ServerName: blue, ServerIP: 172.29.1.3!

上述web服务代码实现

[root@blue /usr/local/bin]# cat demo.py
 #!/usr/bin/python3
 #
 from flask import Flask, request, abort, Response, jsonify as flask_jsonify, make_response
 import argparse
 import sys, os, getopt, socket, json, time
 
 app = Flask(__name__)
 
 @app.route('/')
 def index():
     return ('demoapp v1.0 !! ClientIP: {}, ServerName: {}, '
           'ServerIP: {}!\n'.format(request.remote_addr, socket.gethostname(),
                                   socket.gethostbyname(socket.gethostname())))
 
 @app.route('/hostname')
 def hostname():
     return ('ServerName: {}\n'.format(socket.gethostname()))
 
 health_status = {'livez': 'OK', 'readyz': 'OK'}
 probe_count = {'livez': 0, 'readyz': 0}
 
 @app.route('/livez', methods=['GET','POST'])
 def livez():
     if request.method == 'POST':
         status = request.form['livez']
         health_status['livez'] = status
         return ''
 
     else:
         if probe_count['livez'] == 0:
             time.sleep(5)
         probe_count['livez'] += 1
         if health_status['livez'] == 'OK':
             return make_response((health_status['livez']), 200)
         else:
             return make_response((health_status['livez']), 506)
 
 @app.route('/readyz', methods=['GET','POST'])
 def readyz():
     if request.method == 'POST':
         status = request.form['readyz']
         health_status['readyz'] = status
         return ''
 
     else:
         if probe_count['readyz'] == 0:
             time.sleep(15)
         probe_count['readyz'] += 1
         if health_status['readyz'] == 'OK':
             return make_response((health_status['readyz']), 200)
         else:
             return make_response((health_status['readyz']), 507)
 
 @app.route('/configs')
 def configs():
     return ('DEPLOYENV: {}\nRELEASE: {}\n'.format(os.environ.get('DEPLOYENV'), os.environ.get('RELEASE')))
 
 @app.route("/user-agent")
 def view_user_agent():
     # user_agent=request.headers.get('User-Agent')
     return('User-Agent: {}\n'.format(request.headers.get('user-agent')))
 
 def main(argv):
     port = 80
     host = '0.0.0.0'
     debug = False
 
     if os.environ.get('PORT') is not None:
         port = os.environ.get('PORT')
 
     if os.environ.get('HOST') is not None:
         host = os.environ.get('HOST')
 
     try:
         opts, args = getopt.getopt(argv,"vh:p:",["verbose","host=","port="])
     except getopt.GetoptError:
         print('server.py -p <portnumber>')
         sys.exit(2)
     for opt, arg in opts:
         if opt in ("-p", "--port"):
             port = arg
         elif opt in ("-h", "--host"):
             host = arg
         elif opt in ("-v", "--verbose"):
             debug = True
 
     app.run(host=str(host), port=int(port), debug=bool(debug))
 
 
 if __name__ == "__main__":
     main(sys.argv[1:])

4.2 基于tcp协议主动健康检测

由于web应用前端有envoy代理,所以本案例验证时选择直接关闭envoy代理。

image-oupsgwov.png

# mkdir envoy_cluster_health_checks_tcp
# cd envoy_cluster_health_checks_tcp
# cat docker-compose.yaml

 services:
   envoy:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./front-envoy-with-tcp-check.yaml:/etc/envoy/envoy.yaml
     networks:
       envoymesh:
         ipv4_address: 172.30.1.2
         aliases:
         - front-proxy
     depends_on:
     - webserver01-sidecar
     - webserver02-sidecar
 
   webserver01-sidecar:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
     hostname: blue
     networks:
       envoymesh:
         ipv4_address: 172.30.1.3
         aliases:
         - myservice
 
   webserver01:
     image: docker.17ker.top/envoy/demoapp:v1.0
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
       - PORT=8080
       - HOST=127.0.0.1
     network_mode: "service:webserver01-sidecar"
     depends_on:
     - webserver01-sidecar
 
   webserver02-sidecar:
     image: envoyproxy/envoy:v1.30.1
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
     volumes:
     - ./envoy-sidecar-proxy.yaml:/etc/envoy/envoy.yaml
     hostname: yellow
     networks:
       envoymesh:
         ipv4_address: 172.30.1.4
         aliases:
         - myservice
 
   webserver02:
     image: docker.17ker.top/envoy/demoapp:v1.0
     environment:
       - ENVOY_UID=0
       - ENVOY_GID=0
       - PORT=8080
       - HOST=127.0.0.1
     network_mode: "service:webserver02-sidecar"
     depends_on:
     - webserver02-sidecar
 
 networks:
   envoymesh:
     driver: bridge
     ipam:
       config:
         - subnet: 172.30.1.0/24
# cat front-envoy-with-tcp-check.yaml
 admin:
   profile_path: /tmp/envoy.prof
   access_log_path: /tmp/admin_access.log
   address:
     socket_address: { address: 0.0.0.0, port_value: 9901 }
 
 static_resources:
   listeners:
   - name: listener_0
     address:
       socket_address: { address: 0.0.0.0, port_value: 80 }
     filter_chains:
     - filters:
       - name: envoy.filters.network.http_connection_manager
         typed_config:
           "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
           stat_prefix: ingress_http
           codec_type: AUTO
           route_config:
             name: local_route
             virtual_hosts:
             - name: webservice
               domains: ["*"]
               routes:
               - match: { prefix: "/" }
                 route: { cluster: web_cluster_01 }
           http_filters:
           - name: envoy.filters.http.router
             typed_config:
               "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
   clusters:
   - name: web_cluster_01
     connect_timeout: 0.25s
     type: STRICT_DNS
     lb_policy: ROUND_ROBIN
     load_assignment:
       cluster_name: web_cluster_01
       endpoints:
       - lb_endpoints:
         - endpoint:
             address:
               socket_address: { address: myservice, port_value: 80 }
     health_checks:
     - timeout: 5s
       interval: 10s
       unhealthy_threshold: 2
       healthy_threshold: 2
       tcp_health_check: {}
# cat envoy-sidecar-proxy.yaml
 admin:
   profile_path: /tmp/envoy.prof
   access_log_path: /tmp/admin_access.log
   address:
     socket_address:
        address: 0.0.0.0
        port_value: 9901
 
 static_resources:
   listeners:
   - name: listener_0
     address:
       socket_address: { address: 0.0.0.0, port_value: 80 }
     filter_chains:
     - filters:
       - name: envoy.filters.network.http_connection_manager
         typed_config:
           "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
           stat_prefix: ingress_http
           codec_type: AUTO
           route_config:
             name: local_route
             virtual_hosts:
             - name: local_service
               domains: ["*"]
               routes:
               - match: { prefix: "/" }
                 route: { cluster: local_cluster }
           http_filters:
           - name: envoy.filters.http.router
             typed_config:
               "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
 
   clusters:
   - name: local_cluster
     connect_timeout: 0.25s
     type: STATIC
     lb_policy: ROUND_ROBIN
     load_assignment:
       cluster_name: local_cluster
       endpoints:
       - lb_endpoints:
         - endpoint:
             address:
               socket_address: { address: 127.0.0.1, port_value: 8080 }
在终端1中运行
 # docker-compose up -d
在终端2中查看状态
 # curl http://172.30.1.2:9901/stats | grep health_check
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
 100 20281    0 20281    0     0  17.0M      0 --:--:-- --:--:-- --:--:-- 19.3M
 cluster.web_cluster_01.health_check.attempt: 8
 cluster.web_cluster_01.health_check.degraded: 0
 cluster.web_cluster_01.health_check.failure: 0
 cluster.web_cluster_01.health_check.healthy: 2
 cluster.web_cluster_01.health_check.network_failure: 0
 cluster.web_cluster_01.health_check.passive_failure: 0
 cluster.web_cluster_01.health_check.success: 8
 cluster.web_cluster_01.health_check.verify_cluster: 0
 http.ingress_http.tracing.health_check: 0

 在终端2中执行
 # docker ps
 CONTAINER ID   IMAGE                                COMMAND                   CREATED         STATUS         PORTS       NAMES
 a1f2d190db5d   envoyproxy/envoy:v1.30.1             "/docker-entrypoint.…"   3 minutes ago   Up 3 minutes   10000/tcp   envoy_cluster_health_checks_tcp-envoy-1
 eba5e2d21e26   docker.17ker.top/envoy/demoapp:v1.0   "/bin/sh -c 'python3…"   3 minutes ago   Up 3 minutes               envoy_cluster_health_checks_tcp-webserver01-1
 a14fac3a0265   docker.17ker.top/envoy/demoapp:v1.0   "/bin/sh -c 'python3…"   3 minutes ago   Up 3 minutes               envoy_cluster_health_checks_tcp-webserver02-1
 0cd68453fa48   envoyproxy/envoy:v1.30.1             "/docker-entrypoint.…"   3 minutes ago   Up 3 minutes   10000/tcp   envoy_cluster_health_checks_tcp-webserver02-sidecar-1
 cad933da773d   envoyproxy/envoy:v1.30.1             "/docker-entrypoint.…"   3 minutes ago   Up 3 minutes   10000/tcp   envoy_cluster_health_checks_tcp-webserver01-sidecar-1

在终端2中执行
 # docker stop envoy_cluster_health_checks_tcp-webserver01-sidecar-1
 envoy_cluster_health_checks_tcp-webserver01-sidecar-1

在终端2中执行
 # curl http://172.30.1.2:9901/stats | grep health_check
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
 100 20302    0 20302    0     0  20.8M      0 --:--:-- --:--:-- --:--:-- 19.3M
 cluster.web_cluster_01.health_check.attempt: 12
 cluster.web_cluster_01.health_check.degraded: 0
 cluster.web_cluster_01.health_check.failure: 1
 cluster.web_cluster_01.health_check.healthy: 2
 cluster.web_cluster_01.health_check.network_failure: 1
 cluster.web_cluster_01.health_check.passive_failure: 0
 cluster.web_cluster_01.health_check.success: 11
 cluster.web_cluster_01.health_check.verify_cluster: 0
 http.ingress_http.tracing.health_check:

0

评论区