First of all I want to start this post/tutorial by clarifying I am not an nginx expert, my experience is mainly based on using nginx as an ingress controller in the kubernetes space but recently I was asked if the logs in nginx are possible to adjust to meet an specific configuration and I was very curious and took the challenge to investigate and see what I was able to do, this article is the reflection of what I was trying to do, how did I found the solution and the end result, hope you enjoy the ride!
The request
You can imagine that any company no matter how small or big it is, does have all kind of logging enabled in their servers, that is a fact (or at least it should be) but when you have an heterogeneous environment with multiple different platforms you also end up having different kinds of logs, so when you need to do a troubleshooting you end up with multiple different formats for those log files.
This makes the task of diagnosing/triaging even more complicated, so the request was simple, let's make our logs as heterogeneous as possible, let's set the same flags and let's have a consolidated log format.
This makes a lot of sense, so I took the nginx part to try to match it with other systems.
How does an nginx log is configured?
As I said before, I am not an expert so this was going to be an investigation job to figure it out how to accomplish this task.
By reading a bit over nginx I found out the configuration for it is stored in a file where you set all kind of flags to adjust nginx to your needs. I was particularly interested in this piece
log_format main
'remote_addr:$remote_addr\t'
'time_local:$time_local\t'
'request_size:$request_length\t'
'http_method:$request_method\t'
'url:$uri\t'
'uri_path:$request_uri\t'
'http_version:$request\t'
'http_status:$status\t'
'user_agent:$http_user_agent\t'
'referrer_header:$http_referer\t'
'request_id:$request_id\t'
'my_remoteaddress:$myremoteaddress\t'
'my_uri:$myuri\t';
access_log /var/log/nginx/access.log main;
This block in here tells nginx which columns (name and values) to store in the log, while the access_log
one tells where to store that configuration.
Note that after log_format
you have main
, this is an identifier, you can have different formats defined and store them in different files, for example all related to a certain path /private
could have a log_format and be stored in a different file than all the other endpoints.
Now, you might be thinking in what are all those $variables in there and how do I know which ones are available .... well I was in that place as well and I found this.
Turns out that there are a lot of variables defined in nginx that we can use to adjust our logs, sweet!
Special request
Up to this point all was done as I was able to find a way to select exactly what the other logs were showing and making nginx matching the defined structure.... but wait, there is more, if you are trying to trace a client doing some requests, you need to make your application log (in some way) some kind of session id
that is consistent between calls.
You have two ways of doing this:
- You adjust your application to track the client and the endpoint being accessed and log that
- You try to make that outside of your app
So I started investigating what was available in nginx to do this and found it actually can execute javascript!! (a small version of javascript but powerful enough for what I needed).
After some investigation and reading this and this I was able to make a small function that calculated an md5 of the client_ip
and url
, keep in mind this is a POC, you can use whatever variable that makes more sense to you.
By doing this, every request from a client to the same url, will have a md5 hash associated.
This is my code as an example
export default { md5calculation }
function md5calculation (r) {
var myhash = require('crypto')
return myhash .createHash('md5').update(r.remoteAddress).update(r.uri).digest('base64');
}
What's next?
Now that I had a javascript function to calculate the md5 and the config ready for nginx, the next steps is testing this. For this (and for many other things) I always have a kubernetes cluster running somewhere, and I deploy all my POCs in them as I found that to be the easiest way to do quick tests and POCs.
Here are the two files I created for this
First a config map to store the javascript function, the nginx.conf and a virtualhost file.
apiVersion: v
kind: ConfigMap
metadata:
name: nginx-conf
namespace: nginx
data:
functions.js : |
export default { md5calculation }
function md5calculation (r) {
var myhash = require('crypto')
return myhash .createHash('md5').update(r.remoteAddress).update(r.uri).digest('base64');
}
nginx.conf: |
user nginx;
load_module /usr/lib/nginx/modules/ngx_http_js_module.so;
worker_processes 3;
error_log /var/log/nginx/error.log;
events {
worker_connections 10240;
}
http {
js_import /etc/nginx/functions.js;
js_set $md5calculation functions.md5calculation;
log_format main
'remote_addr:$remote_addr\t'
'time_local:$time_local\t'
'request_size:$request_length\t'
'url:$uri\t'
'http_version:$request\t'
'http_status:$status\t'
'user_agent:$http_user_agent\t'
'request_id:$request_id\t'
'md5session:$md5calculation\t';
access_log /var/log/nginx/access.log main;
server {
listen 80;
server_name _;
location / {
root html;
index index.html index.htm;
}
}
include /etc/nginx/virtualhost/virtualhost.conf;
}
virtualhost.conf: |
upstream app {
server localhost:8080;
keepalive 1024;
}
server {
listen 80 default_server;
root /usr/local/app;
access_log /var/log/nginx/app.access_log main;
error_log /var/log/nginx/app.error_log;
location / {
proxy_pass http://app/;
proxy_http_version 1.1;
}
}
And this another file that is my deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /etc/nginx # mounting nginx-conf vol to /etc/nginx
readOnly: true
name: nginx-conf
- mountPath: /var/log/nginx
name: log
volumes:
- name: nginx-conf
configMap:
name: nginx-conf
items:
- key: nginx.conf
path: nginx.conf
- key: virtualhost.conf
path: virtualhost/virtualhost.conf # dig directory
- key: functions.js
path: functions.js
- name: log
emptyDir: {}
With this two files deployed into my kubernetes cluster, it was only matter of:
- deploy the files with
kubectl apply -f file.yaml
- Jump into the pod with
kubectl exec -it podname -- /bin/bash
- Once in the pod tail the log with
tail -f /var/log/nginx/app.access_log
- In another terminal port-forward the pod to my machine
kubectl port-forward podname 30000:80
- Open a browser and navigate to
localhost:3000
- Do some more paths multiple times like
localhost:3000/testpath
andlocalhost:30000/anothertestpath
- Watch the logs in the terminal where the
tail
is running
There you will see something like this
remote_addr:127.0.0.1 time_local:26/Aug/2022:15:55:04 +0000 request_size:725 url:/ http_version:GET / HTTP/1.1 http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 request_id:555781345d037c960edf4120c29955e6 md5session_id:jr5H1UccgZv/oEQiT6IfMg==
remote_addr:127.0.0.1 time_local:26/Aug/2022:15:55:06 +0000 request_size:725 url:/ http_version:GET / HTTP/1.1 http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 request_id:dee1482fc88a681901459b4dbf2b299b md5session_id:jr5H1UccgZv/oEQiT6IfMg==
remote_addr:127.0.0.1 time_local:26/Aug/2022:15:55:18 +0000 request_size:707 url:/testpath http_version:GET /testpath HTTP/1.1 http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 request_id:7dfcbfc8dd521e0c6695bce28d349a2d md5session_id:tACXcd3YlC4DePpGiCiw9Q==
remote_addr:127.0.0.1 time_local:26/Aug/2022:15:55:26 +0000 request_size:714 url:/anothertestpath http_version:GET /anothertestpath HTTP/1.1 http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 request_id:dce581c5fa5d0d60c8881040e8f866b6 md5session_id:8cuKr5JSGv9vm5xryLGTzg==
remote_addr:127.0.0.1 time_local:26/Aug/2022:15:57:25 +0000 request_size:740 url:/anothertestpath http_version:GET /anothertestpath HTTP/1.1 http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 request_id:bd5d62bdeabf431a1a93119645eae056 md5session_id:8cuKr5JSGv9vm5xryLGTzg==
Here you can see in the logs, the column mdsession
contains the hash calculated by the function and you can see that when calling the same path, the hash is the same.
Remember you can put whatever you need in that function to calculate the md5, this is just an example on how it works
Wrapping up
So by the end of this (fun) exercise I learned:
- How nginx logs work
- That nginx can run basic javascript in the engine
- That I can fully configure my nginx from "outside" by using configmaps
- How well documented nginx is
I hope this article helps someone else out there that might be looking for this same information as I needed to read several different places and do a good amount of work to find out the right configuration.
If you liked this article, please consider letting me know by liking this article, leaving a comment or sharing this article with others, you can also follow me in my networks.
Thank you for reading!
Top comments (0)