Expose prometheus metrics

**Is your feature request related to a problem? Please describe.**
Recently we had issues with our FSX volumes mounting in our application pods. We could not `ls` the directory.

It was very unclear what the issue was because within the AWS Console the FSX volume was not at capacity. There were no issues

Within the csi driver daemonset pod, there were these logs:
```
E0514 19:15:20.815598       1 driver.go:104] "GRPC error" err=<
	rpc error: code = Internal desc = Could not mount "fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev" at "/var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount": mount failed: exit status 5
	Mounting command: mount
	Mounting arguments: -t lustre fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev /var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount
	Output: mount.lustre: mount fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev at /var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount failed: Input/output error
	Is the MGS running?
 >
```

It would be great if the pod had metric saying there were mounting issues. With that metric, I can fire an alert to our SREs!

Eventually we rolled out the daemonset + deployment and that resolved this issue... But even that wasn't in your troubleshooting guide.

**Describe the solution you'd like in detail**
Ideally expose metrics that show health or problems, through a prometheus endpoint. We would like to build prom queries, and alerting that can show that the fsx csi driver is healthy

**Describe alternatives you've considered**
I can also build a solution around parsing logs... but I would prefer to just have metrics. Prom metrics seems to be an industry standard

**Additional context**
N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose prometheus metrics #431

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose prometheus metrics #431

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions