最近一段时间在排查ingress日志时,总是感觉不太方便,虽然使用Opentelemetry做了链路追踪,但也只能排查到一个请求的整条链路。
同一个会话的不同请求还是不容易联合查询,于是打算对日志记录进行一下改造。
最后选择只对已登录的用户,即携带JWT的cookie的header信息进行记录,但这个信息太长了,有多长...大家看看
eyJhbGciOini982xMiJ9.eyJzdWIiOiJzZWxsZXJfMTA1NzE2NDc3MTUxMjQwX3BjXzE3MjA0Mjc2OTQ2MzkiLCJvcyI6InBjIiwic2NvcGVzIjpbXSwiaW5kdX89aclUeXBlIjoyLCJsb2dpblR5cGUiOjEsImp0aSI6ImVjMWQwODhkLWI1Y2Etabcd0987YmFlLTU4Y2MzZjc0NWJiNSIsImlhdCI6MTcyMDQyNzY5NCwiZXhwIjabcdzMDE5Njk0fQ.xzVh7UgKlnm85sigabcdYooUDtMeRSmGuR15abcdefghoKNrcAOD0cuXSx8KQ2lWjz3ztWO3Upw2_3Y98J4-Dw
这个长度的信息,肯定不适合作为key去查询,所以压缩必不可免。
再加上已有的 apisix -> logfile -> filebeat -> kafka -> logstash -> elasticsearch 的elk栈,基于最小改动原则.
决定在log_format中定义对应header变量,记录到日志后,交由logstash对该字段进行MD5值计算,再存储到ES
定义log_format,记录header信息
此处ingress使用 Apache Apisix 服务,其基于Nginx二开的优势让我们对改动log_format相当熟悉:
进入k8s中对应的namespace,找到对应的configmap,向log_format中添加$http_authorization以记录Authorization header信息
$ kubectl -n ingress-apisix get configmap apisix -o yaml
...
http:
enable_access_log: true
access_log: "/dev/stdout"
access_log_format: "$http_x_forwarded_for $remote_addr \"-\" $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" \"$upstream_addr\" $upstream_response_time $request_time \"$http_x_request_tag\" \"$http_Traceparent\" \"$http_authorization\""
...
经查看日志已记录相关信息
[01/Aug/2024:06:40:10 +0000] test1-buyer.zk8s.com "POST /api/v1/buyer/xxxxxx HTTP/2.0" 200 237 "http://192.168.1.45:8082/h5/pages/cardpack/cardpack?vid=105711976539080&index=4" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" "172.17.34.31:8080" 0.037 0.037 "-" "00-fb1fc1307440abbc0b16796bad5ad883-51f8586156dfc645-01" "eyJhbGciOini982xMiJ9.eyJzdWIiOiJzZWxsZXJfMTA1NzE2NDc3MTUxMjQwX3BjXzE3MjA0Mjc2OTQ2MzkiLCJvcyI6InBjIiwic2NvcGVzIjpbXSwiaW5kdX89aclUeXBlIjoyLCJsb2dpblR5cGUiOjEsImp0aSI6ImVjMWQwODhkLWI1Y2Etabcd0987YmFlLTU4Y2MzZjc0NWJiNSIsImlhdCI6MTcyMDQyNzY5NCwiZXhwIjabcdzMDE5Njk0fQ.xzVh7UgKlnm85sigabcdYooUDtMeRSmGuR15abcdefghoKNrcAOD0cuXSx8KQ2lWjz3ztWO3Upw2_3Y98J4-Dw"
配置logstash,对field进行压缩
对记录的Authorization header信息进行MD5值计算,并插入到ES
-
logstash配置:
... filter { grok { # 定义grok表达式,识别对应field为http.header.authorization match => [ "message", "(?:%{NGINX_ADDRESS_LIST:nginx.access.remote_ip_list}) \"-\" (-|%{DATA:user.name}) \[%{HTTPDATE:nginx.access.time}\] (%{NGINX_HOST})? \"%{DATA:nginx.access.info}\" %{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long} \"(-|%{DATA:http.request.referrer})\" \"(-|%{DATA:user_agent.original})\" \"(-|%{HOSTPORT:nginx.upstream.address})\" (-|%{NUMBER:nginx.upstream.response.time:float}) (-|%{NUMBER:nginx.request.time:float}) \"(-|%{DATA:http.header.x_request_tag})\"( \"00-%{DATA:Trace.Id}-%{DATA:Trace.SpanId}-01\")( \"(-|%{DATA:http.header.authorization})\")?" ] } ... # 对http.header.authorization进行MD5值计算处理,赋值给新field user_agent.logged_id后,删除http.header.authorization if [http.header.authorization] { ruby { code => " require 'digest/md5' event.set('user_agent.logged_id', Digest::MD5.hexdigest(event.get('http.header.authorization'))) event.remove('http.header.authorization') " } } ... -
ES存储查询效果
定位到某个请求的user_agent.logged_id字段,并以其为查询条件,查找同一个会话的其他请求


这种方式的缺点在于日志占用空间大,logstash压力高,且因MD5值在Apisix外部计算得到,无法以其为key,进行接口限流操作。再想想办法...
评论区