Use WHvResetPartition on windows#1360
Conversation
4d1a570 to
2b61178
Compare
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
simongdavies
left a comment
There was a problem hiding this comment.
Looks good , just a few comments, it would be good to tighten up the AMD fix targeting and also see if we can get rid of the need to call sregs reset independently
| /// | ||
| /// This does NOT restore special registers (except on windows). Call `restore_sregs` separately | ||
| /// after memory mappings are established. | ||
| // TODO: check if other state needs to be reset |
There was a problem hiding this comment.
Is this TODO still valid (presumably not on Windows) is there an issue to track this
| /// GP registers, debug registers, XSAVE, MSRs, APIC, etc.) | ||
| /// - On Linux: explicitly resets GP registers, debug registers, and XSAVE | ||
| /// | ||
| /// This does NOT restore special registers (except on windows). Call `restore_sregs` separately |
There was a problem hiding this comment.
This seems a bit fragile, why can we can restore_sregs from here?
| ..Default::default() | ||
| }; | ||
| assert_ne!( | ||
| regs.rdx, 0x4444444444444444, |
There was a problem hiding this comment.
What's the significance of this value?
| @@ -107,7 +107,7 @@ pub(crate) enum HypervisorType { | |||
| /// Minimum XSAVE buffer size: 512 bytes legacy region + 64 bytes header. | |||
| /// Only used by MSHV and WHP which use compacted XSAVE format and need to | |||
| /// | ||
| /// This bug most commonly surfaces as the `interrupt_same_thread_no_barrier` | ||
| /// integration test failing on AMD Windows hosts. @ludfjig has verified | ||
| /// the fix works on Windows Insider builds; this workaround can be |
There was a problem hiding this comment.
Can we get an issue to track this? Even better is there a way we can have a test that fails when this is fixed in windows? That will force us then to gate this code on a specific windows version or lower since I think the comment is wrong, we will need this to endure but will only need to apply it on impacted versions of windows
| /// The workaround unmaps and remaps a dummy page after each | ||
| /// `WHvResetPartition`, which forces the hypervisor to invalidate | ||
| /// NPT entries. | ||
| mod npt_flush { |
There was a problem hiding this comment.
can we also gate this on affected CPUs only?
| /// after memory mappings are established. | ||
| // TODO: check if other state needs to be reset | ||
| pub(crate) fn reset_vcpu( | ||
| pub(crate) fn reset_vm_state(&mut self) -> std::result::Result<(), RegisterError> { |
There was a problem hiding this comment.
It feels a bit of a shame to have this (ought-to-be-mostly-platform-independent) code making the call about whether or not to use reset_partition. Perhaps it would make more sense to either have a vm.reset_partition_supported() or have vm.reset_partition() return a particular failure value that we can see and fall back to the alternative approach here? (I assume the reason why the fallback is here and not in each individual HV driver is that the fallback should be more-or-less the same on all platforms of the same architecture).
That adds a bit of dynamic dispatch and a runtime branch, though---but I am not sure the performance overhead will be noticeably since it seems like this is a pretty heavy operation. If it is, I've actually been wondering for a bit why we use the dyn VirtualMachine here---could we push the dyn up a layer or two and make HyperlightVm<T: VirtualMachine>? I imagine that that would open up a bunch of opportunities for inlining in general.
There was a problem hiding this comment.
I'd like that, but we cannot currently do this since we only know which one of kmv/mshv is available at runtime. Maybe having the mshv3/kvm cargo features be mutually exclusive could help with this (although mutually exclusive featuers are discouraged). Agree with the first part though
There was a problem hiding this comment.
What I meant for the latter was that we could add a HyperlightVm trait and have the sandboxes hold a dyn HyperlightVm which at runtime is e.g. HyperlightVm<KvmVm>, on the assumption (to be checked) that having the annoying dynamic dispatch be at that higher level would be beneficial.
| // Re-initialize it when LAPIC emulation is active. | ||
| // | ||
| // set_sregs filters out APIC_BASE (the snapshot sregs | ||
| // default it to 0, which would disable the LAPIC). This |
There was a problem hiding this comment.
Would just making sure that the snapshot captures APIC_BASE (and that Snapshot::from_file sets it to the architectural default value when the hw-interrupts feature is on) work?
| } | ||
|
|
||
| // Reset VM/vCPU state before memory restore. | ||
| self.vm.reset_vm_state().map_err(|e| { |
There was a problem hiding this comment.
I assume that the removal of the sregs reset from reset_vcpu when it was renamed is related to moving this in front of the memory restore? Is there a particular reason that it should happen before?
There was a problem hiding this comment.
For some reason it only works when done before which is why I did it this way, I'm not sure why 🤷
Addresses #791 for windows (WHP), by resetting all guest visible state on the partition. Resets more vm state on
restore()at the expense of some performance (roughly ~100micros). MSHV is planned to use the same hv call, but doesn't exist in mshv crate yet, although dom0 kernel should have support already. KVM doesn't have a nice api like this, so will still manual work for KVM.Note: the second commit is a workaround for a hyper-v bug