Error on server spawn: Multi-Attach error for volume "pvc-xxx..."
Currently multiple users cannot start their servers due to the error:
Error on server spawn: Multi-Attach error for volume "pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" Volume is already exclusively attached to one node and can't be attached to another
This occurs because the volume does somehow not get detached, when the user server stops. An attachment is bound to a node and if the user lands on the same node while starting his/her server, he/she does not get any error, but if the user lands on another node, the above error occurs. If we manually delete the orphaned attachment, the user can start his/her server again. This can be done in the following way:
We can find the PV (persistent volume) for the affected USER with
kubectl get -n jupyterhub pvc claim-$USER
It has the form "pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx". With this PV name we can find the attachment with:
kubectl get volumeattachments | grep pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
It has the form "csi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx". It can be detached with:
kubectl delete volumeattachment csi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
A few observations:
- A test PVC bound to a test pod using the same storage class and access mode does not leave the attachment, when the pod finishes or gets deleted before it finishes. So currently it seems only to occur with jupyter-$USER pods.
- During the upgrade of democratic-nfs-client we got an error that
spec.attachedRequired
cannot be changed from false to true. This could be related. Here the complete error messageError: UPGRADE FAILED: cannot patch "org.democratic-csi.nfs-client" with kind CSIDriver: CSIDriver.storage.k8s.io "org.democratic-csi.nfs-client" is invalid: spec.attachedRequired: Invalid value: true: field is immutable
.
Possible workarounds:
- A python script that finds and deletes all orphaned attachments, which are attachments on nodes, where the corresponding user server does not run. Could be run manually or automatically using cron.
- The access mode of the user persistent volumes could be changed from RWO (ReadWriteOnce) to RWX (ReadWriteMany). This probably only works for new users that do not have a persistent volume yet.