New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop SYS_RESOURCE capability #5558
Conversation
Skipping CI for Draft Pull Request. |
/test pull-kubevirt-e2e-kind-1.17-sriov |
168076a
to
adf265c
Compare
/test pull-kubevirt-e2e-kind-1.17-sriov |
adf265c
to
c9016f0
Compare
/test pull-kubevirt-e2e-kind-1.17-sriov |
/cc @EdDev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gracefully shutdown in case of virt-launcher crash [1][2].
The links are not permanent, so these will become incorrect when the code change.
_, _, errno := unix.RawSyscall6(unix.SYS_PRLIMIT64, | ||
uintptr(pid), | ||
limit, | ||
uintptr(unsafe.Pointer(rlimit)), // #nosec used in unix RawSyscall6 | ||
uintptr(unix.RLIMIT_MEMLOCK), // #nosec used in unix RawSyscall6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the comment of #nosec
moved to a different command? Are you sure it is not needed now for both now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#nosec
comments are added where unsafe pointers are used #4446
This comment no longer needed, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we still need this comment for line 145, I will just move it
@@ -98,13 +99,19 @@ func (s *socketBasedIsolationDetector) DetectForSocket(vm *v1.VirtualMachineInst | |||
|
|||
} | |||
|
|||
if process, err := ps.FindProcess(pid); err != nil { | |||
return nil, err | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is common in the code not to do this. How about:
process, err := ps.FindProcess(pid)
if err != nil {
return ...
}
ppid := process.PPid()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
< With Libvirt 7 (Coper)
I do not understand how the Coper
part relevant here? That is the source of the package, we mainly care about the version. If you want to be specific, give a more detailed version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit post migration: Adjust QEMU and libvirt processes memlock limits
:
The change in this commit includes both logic addition and refactoring of the existing code. Please split the refactoring (and naming changes) to a separate commit so it will be easier to see what logic changed.
if err != nil { | ||
return err | ||
func filterVirtLaunchersLibvirtProcesses(isolationResult IsolationResult, processes []ps.Process) []ps.Process { | ||
var filteredProcesses []ps.Process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? On line 208 you can just use :=
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may look redundant here because there are only two assignments.
I think its easier to track where a variable was declared when there are more then one assignment (the first is always :=
).
func filterVirtLaunchersLibvirtProcesses(isolationResult IsolationResult, processes []ps.Process) []ps.Process { | ||
var filteredProcesses []ps.Process | ||
|
||
// consider all processes that are virt-launcher children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain it all as a doctext? Having inline comments is hard to keep up-to-date and read the code.
|
||
filteredProcesses = filterProcessesByParentPID(processes, []int{launcherPid, launcherParentPid}) | ||
// libvirtd process sets the memory lock limit before fork/exec-ing into qemu | ||
// Its necessary to change QMEU process limits if its already exists (e.g: post migration just before hostdev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we are in a reconcile? Are you sure we can reach this only on post-migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will reach this also from reconcile, although adjust resources is called only when VM is not running
https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-handler/vm.go#L2420
https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-handler/vm.go#L2446
Do we want to restrict changing QEMU process limits only on post migration?
If so I can create AdjsutResourcesOnPostMigration
function
that will be called on finalizeMigration
(instead of regular AdjustResources
)
@EdDev What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is less risky if the change is done per the specific need known to us. The qemu process memlock limit needs to be changed only when the VMI has a VFIO device and the state is post-migration.
If this is open to other scenarios, we may get into corner cases where we should not be.
If so I can create AdjsutResourcesOnPostMigration function
As I see it, a function name should express what it does, not who or when it is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed new changes so that only QEMU process will be adjusted on post migration
} | ||
// bump libvirtd and qemu processes locked memory limits | ||
libvirtProcesses := filterVirtLaunchersLibvirtProcesses(isolationResult, processes) | ||
log.Log.Info("debug: adjust memlock: filtered processes:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some debug logs just fyi (i guess its known)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, please use logs verbosity, log.Log.V(X).Info instead of log.Log.Info("debug:
- KubeVirt uses verbosity levels, so you can print different messages at various levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these logs just for debugging I was going to remove them (all logs that stats with 'debug')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is still valuable to keep it? up to you, but I think if it was valuable for you to debug, it might be useful when things go wrong ;)
Cur: uint64(memlockSize), | ||
} | ||
err = prLimit(process.Pid(), unix.RLIMIT_MEMLOCK, &rLimit) | ||
for _, process := range libvirtProcesses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please combine this loop with the one on line 190?
libvirtProcesses := filterVirtLaunchersLibvirtProcesses(isolationResult, processes) | ||
log.Log.Info("debug: adjust memlock: filtered processes:") | ||
for _, process := range libvirtProcesses { | ||
log.Log.Infof("debug: pid %d, ppid %d, exec %s", process.Pid(), process.PPid(), process.Executable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make the debug prints more informative. Something like: "found libvirt process with pid %d...."
pkg/virt-handler/vm.go
Outdated
@@ -2641,6 +2641,13 @@ func (d *VirtualMachineController) finalizeMigration(vmi *v1.VirtualMachineInsta | |||
if err != nil { | |||
return err | |||
} | |||
|
|||
err = d.podIsolationDetector.AdjustResources(vmi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality, a failure here doesn't affect the result of migration.
The migration will still show as succeeded. However, perhaps we do need to react differently in this case?
Maybe, we should at least send an alert in case we didn't manage to plug the devices back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right, its wrong to return on AdjustResources
error, thanks for the heads up.
I will log the error instead of returning, then in case a device is failed to plug it will be logged.
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vladikr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ormergi: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
Following kubevirt/kubevirt#5558 Kubevirt is not longer requires SYS_RESOURCE capability in order to enable SRIOV VM's migration. We still want this feature-gate to enable the user to choose whether to enable SRIOV VM migration or not until this feature is maturated. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Following kubevirt/kubevirt#5558 Kubevirt is not longer requires SYS_RESOURCE capability in order to enable SRIOV VM's migration. We still want this feature-gate to enable the user to choose whether to enable SRIOV VM migration or not until this feature is maturated. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Following [1], it is no longer needed to add SYS_RESOURCE capability to virt-launcher, SRIOVLiveMigration feature-gate is redundant. SR-IOV live migration in controlled by 'LiveMigration' feature gate. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1], it is no longer needed to add SYS_RESOURCE capability to virt-launcher, SRIOVLiveMigration feature-gate is redundant. SR-IOV live migration in controlled by 'LiveMigration' feature gate. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. SR-IOV live migration will be controlled by `LiveMigration` feature-gate similar to non-SR-IOV VMs. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. SR-IOV live migration will be controlled by `LiveMigration` feature-gate similar to non-SR-IOV VMs. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. SR-IOV live migration will be controlled by `LiveMigration` feature-gate similar to non-SR-IOV VMs. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. SR-IOV live migration will be controlled by `LiveMigration` feature-gate similar to non-SR-IOV VMs. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. SR-IOV live migration will be controlled by `LiveMigration` feature-gate similar to non-SR-IOV VMs. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE` capability anymore, `SRIOVLiveMigration` feature-gate is redundant. [1] kubevirt#5558 Signed-off-by: Or Mergi <ormergi@redhat.com>
What this PR does / why we need it:
Currently virt-launcher runs with SYS_RESOURCE capability on VMs with SRIOV devices when SRIOVLiveMigation feature-gate enabled.
Having this capability enables Libvirt to adjust QEMU process memory lock limits (memlock rlimit) in case its already exists
As occurs when SRIOV VM is migrated and the SRIOV host-device is re-attached to VM on target.
With Libvirt and above, Libvirt does best effort to adjusts QEMU process limits and wont try to change the limits if they are high enough.
In order to prevent using SYS_RESOURCE cap on containerized environments [1] as Kubevirt use case.
With this PR changes Kubevirt will adjust QEMU process rlimits as it does for libvirtd.
This will prevent libvirt from adjusting QEMU process rlimits (it will detect that limits are sufficient).
Then virt-launcher resources will be adjusted on post migration, in order to enable hot plugging VFIO devices
This PR also remove SYS_RESOURCE capability.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Rebased on #5328
virt-handler
is privileged whilevirt-launcher
is not.Before this change, it was already done for the libvirtd process, the only difference here is that we now need the qemu one.
Release note: