Skip to content

Expose prometheus metrics #431

Open
Open
@domenicbove

Description

@domenicbove

Is your feature request related to a problem? Please describe.
Recently we had issues with our FSX volumes mounting in our application pods. We could not ls the directory.

It was very unclear what the issue was because within the AWS Console the FSX volume was not at capacity. There were no issues

Within the csi driver daemonset pod, there were these logs:

E0514 19:15:20.815598       1 driver.go:104] "GRPC error" err=<
	rpc error: code = Internal desc = Could not mount "fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev" at "/var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount": mount failed: exit status 5
	Mounting command: mount
	Mounting arguments: -t lustre fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev /var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount
	Output: mount.lustre: mount fs-<id>.fsx.us-west-2.amazonaws.com@tcp:/xmym3bev at /var/lib/kubelet/pods/b95c1daf-c469-4177-9113-0c73bab808b3/volumes/kubernetes.io~csi/<fsxname>/mount failed: Input/output error
	Is the MGS running?
 >

It would be great if the pod had metric saying there were mounting issues. With that metric, I can fire an alert to our SREs!

Eventually we rolled out the daemonset + deployment and that resolved this issue... But even that wasn't in your troubleshooting guide.

Describe the solution you'd like in detail
Ideally expose metrics that show health or problems, through a prometheus endpoint. We would like to build prom queries, and alerting that can show that the fsx csi driver is healthy

Describe alternatives you've considered
I can also build a solution around parsing logs... but I would prefer to just have metrics. Prom metrics seems to be an industry standard

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions