Javier Marasco

Posted on Aug 26, 2022 • Edited on Sep 14, 2022

Making your nginx logs look the way you want them

#nginx #kubernetes #devops #tutorials

First of all I want to start this post/tutorial by clarifying I am not an nginx expert, my experience is mainly based on using nginx as an ingress controller in the kubernetes space but recently I was asked if the logs in nginx are possible to adjust to meet an specific configuration and I was very curious and took the challenge to investigate and see what I was able to do, this article is the reflection of what I was trying to do, how did I found the solution and the end result, hope you enjoy the ride!

The request

You can imagine that any company no matter how small or big it is, does have all kind of logging enabled in their servers, that is a fact (or at least it should be) but when you have an heterogeneous environment with multiple different platforms you also end up having different kinds of logs, so when you need to do a troubleshooting you end up with multiple different formats for those log files.
This makes the task of diagnosing/triaging even more complicated, so the request was simple, let's make our logs as heterogeneous as possible, let's set the same flags and let's have a consolidated log format.
This makes a lot of sense, so I took the nginx part to try to match it with other systems.

How does an nginx log is configured?

As I said before, I am not an expert so this was going to be an investigation job to figure it out how to accomplish this task.

By reading a bit over nginx I found out the configuration for it is stored in a file where you set all kind of flags to adjust nginx to your needs. I was particularly interested in this piece

log_format  main
              'remote_addr:$remote_addr\t'
              'time_local:$time_local\t'
              'request_size:$request_length\t'
              'http_method:$request_method\t'
              'url:$uri\t'
              'uri_path:$request_uri\t'
              'http_version:$request\t'
              'http_status:$status\t'
              'user_agent:$http_user_agent\t'
              'referrer_header:$http_referer\t'
              'request_id:$request_id\t'
              'my_remoteaddress:$myremoteaddress\t'
              'my_uri:$myuri\t';


access_log  /var/log/nginx/access.log main;

This block in here tells nginx which columns (name and values) to store in the log, while the access_log one tells where to store that configuration.

Note that after log_format you have main, this is an identifier, you can have different formats defined and store them in different files, for example all related to a certain path /private could have a log_format and be stored in a different file than all the other endpoints.

How to setup logs in nginx

Now, you might be thinking in what are all those $variables in there and how do I know which ones are available .... well I was in that place as well and I found this.
Turns out that there are a lot of variables defined in nginx that we can use to adjust our logs, sweet!

Special request

Up to this point all was done as I was able to find a way to select exactly what the other logs were showing and making nginx matching the defined structure.... but wait, there is more, if you are trying to trace a client doing some requests, you need to make your application log (in some way) some kind of session id that is consistent between calls.
You have two ways of doing this:

You adjust your application to track the client and the endpoint being accessed and log that
You try to make that outside of your app

So I started investigating what was available in nginx to do this and found it actually can execute javascript!! (a small version of javascript but powerful enough for what I needed).

After some investigation and reading this and this I was able to make a small function that calculated an md5 of the client_ip and url, keep in mind this is a POC, you can use whatever variable that makes more sense to you.
By doing this, every request from a client to the same url, will have a md5 hash associated.

This is my code as an example

export default { md5calculation }

    function md5calculation (r) {
      var myhash = require('crypto')
      return myhash .createHash('md5').update(r.remoteAddress).update(r.uri).digest('base64');
    }

What's next?

Now that I had a javascript function to calculate the md5 and the config ready for nginx, the next steps is testing this. For this (and for many other things) I always have a kubernetes cluster running somewhere, and I deploy all my POCs in them as I found that to be the easiest way to do quick tests and POCs.

Here are the two files I created for this

First a config map to store the javascript function, the nginx.conf and a virtualhost file.

apiVersion: v
kind: ConfigMap
metadata:
  name: nginx-conf
  namespace: nginx
data:
  functions.js : |
    export default { md5calculation }
    function md5calculation (r) {
      var myhash = require('crypto')
      return myhash .createHash('md5').update(r.remoteAddress).update(r.uri).digest('base64');
    }
  nginx.conf: |
    user nginx;
    load_module /usr/lib/nginx/modules/ngx_http_js_module.so;
    worker_processes  3;
    error_log  /var/log/nginx/error.log;
    events {
      worker_connections  10240;
    }
    http {
      js_import /etc/nginx/functions.js;
      js_set $md5calculation functions.md5calculation;
      log_format  main
              'remote_addr:$remote_addr\t'
              'time_local:$time_local\t'
              'request_size:$request_length\t'
              'url:$uri\t'
              'http_version:$request\t'
              'http_status:$status\t'
              'user_agent:$http_user_agent\t'
              'request_id:$request_id\t'
              'md5session:$md5calculation\t';

      access_log  /var/log/nginx/access.log main;

      server {
          listen       80;
          server_name  _;
          location / {
              root   html;
              index  index.html index.htm;
          }
      }
      include /etc/nginx/virtualhost/virtualhost.conf;
    }
  virtualhost.conf: |
    upstream app {
      server localhost:8080;
      keepalive 1024;
    }

    server {
      listen 80 default_server;
      root /usr/local/app;
      access_log /var/log/nginx/app.access_log main;
      error_log /var/log/nginx/app.error_log;
      location / {
        proxy_pass http://app/;
        proxy_http_version 1.1;
      }
    }

And this another file that is my deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/nginx # mounting nginx-conf vol to /etc/nginx
          readOnly: true
          name: nginx-conf
        - mountPath: /var/log/nginx
          name: log
      volumes:
      - name: nginx-conf
        configMap:
          name: nginx-conf 
          items:
            - key: nginx.conf
              path: nginx.conf
            - key: virtualhost.conf
              path: virtualhost/virtualhost.conf # dig directory
            - key: functions.js
              path: functions.js
      - name: log
        emptyDir: {}

With this two files deployed into my kubernetes cluster, it was only matter of:

deploy the files with kubectl apply -f file.yaml
Jump into the pod with kubectl exec -it podname -- /bin/bash
Once in the pod tail the log with tail -f /var/log/nginx/app.access_log
In another terminal port-forward the pod to my machine kubectl port-forward podname 30000:80
Open a browser and navigate to localhost:3000
Do some more paths multiple times like localhost:3000/testpath and localhost:30000/anothertestpath
Watch the logs in the terminal where the tail is running

There you will see something like this

remote_addr:127.0.0.1   time_local:26/Aug/2022:15:55:04 +0000   request_size:725        url:/   http_version:GET / HTTP/1.1     http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36       request_id:555781345d037c960edf4120c29955e6     md5session_id:jr5H1UccgZv/oEQiT6IfMg== 

remote_addr:127.0.0.1   time_local:26/Aug/2022:15:55:06 +0000   request_size:725        url:/   http_version:GET / HTTP/1.1     http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36       request_id:dee1482fc88a681901459b4dbf2b299b     md5session_id:jr5H1UccgZv/oEQiT6IfMg==  

remote_addr:127.0.0.1   time_local:26/Aug/2022:15:55:18 +0000   request_size:707        url:/testpath   http_version:GET /testpath HTTP/1.1     http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36       request_id:7dfcbfc8dd521e0c6695bce28d349a2d     md5session_id:tACXcd3YlC4DePpGiCiw9Q==  

remote_addr:127.0.0.1   time_local:26/Aug/2022:15:55:26 +0000   request_size:714        url:/anothertestpath    http_version:GET /anothertestpath HTTP/1.1      http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36       request_id:dce581c5fa5d0d60c8881040e8f866b6     md5session_id:8cuKr5JSGv9vm5xryLGTzg==  

remote_addr:127.0.0.1   time_local:26/Aug/2022:15:57:25 +0000   request_size:740        url:/anothertestpath    http_version:GET /anothertestpath HTTP/1.1      http_status:502 user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36       request_id:bd5d62bdeabf431a1a93119645eae056     md5session_id:8cuKr5JSGv9vm5xryLGTzg==

Here you can see in the logs, the column mdsession contains the hash calculated by the function and you can see that when calling the same path, the hash is the same.

Remember you can put whatever you need in that function to calculate the md5, this is just an example on how it works

Wrapping up

So by the end of this (fun) exercise I learned:

How nginx logs work
That nginx can run basic javascript in the engine
That I can fully configure my nginx from "outside" by using configmaps
How well documented nginx is

I hope this article helps someone else out there that might be looking for this same information as I needed to read several different places and do a good amount of work to find out the right configuration.

If you liked this article, please consider letting me know by liking this article, leaving a comment or sharing this article with others, you can also follow me in my networks.

Thank you for reading!

The Ops Community ⚙️