Going through BSOD hell after upgrading to WADK 1809/MDT 3.10.10

During my daily work I’m dependant on using BigFix on VMware Workstation 12 for creating, capturing and (test-)deploying OS-images before exporting them to my customer. This has worked flawlessly so far, working with Win7 images. Now, the need to create Win10 images has come and that is when my BSOD hell started. During the last 4-5 days I’ve become more and more confused to as what is happening. After running around in circles, I started from “Square One” again and decided to document what makes me so confused.

The BSODs appear after the first reboot, after loading boot.wim and running sysprep, when capturing images, and after boot.wim has been loaded and booted, when deploying an image.

BSOD|690x373

This is “Square One”:

Deployment Resources:
PE 10 (1709) MDT 8450 USMT10_1709x86,USMT10_1709x64

MDT Bundle Creators and Windows Media:
Target: BIGFIXLAB2
Deployment Kit: WADK 10.0 (1709)
MDT Bundle Creator Version: 3.10.7.0
MDT Version 8450

Installed OS Deployment Servers:
Server Name: NO00999C21B588
Server Version: 7.1.120.31011

In this environment, everything worked perfectly when only Win7 was involved.

I created a VM for Win10 (1809) and customised it to my liking, then tried to capture it an soon found out that PE10 was outdated:

Completed // Verify the release ID of the image being captured is supported by PE10 version. If this step fails, you must import an MDT Bundle with a later version of PE10
Completed parameter “ImageReleaseID” = "{if (version of operating system as string contains “10.0.10240”) then “1507” else if (version of client >= “9.5”) then (releaseid of operating system) else if exists value “ReleaseId” of key “HKLM\Software\Microsoft\Windows NT\CurrentVersion” of registry then value “ReleaseId” of key “HKLM\Software\Microsoft\Windows NT\CurrentVersion” of registry as string else “1507”}"
Completed if {parameter “ImageReleaseID” as integer > 1607}
Completed parameter “Pe10rid” = "1709"
Completed parameter “isAllowedException” = "{if ((parameter “ImageReleaseID” as integer > 1809 And parameter “Pe10rid” as integer = 1809) Or (parameter “ImageReleaseID” as integer = 1709 And parameter “Pe10rid” as integer = 1703) Or (parameter “ImageReleaseID” as integer = 1803 And parameter “Pe10rid” as integer = 1709) Or (parameter “ImageReleaseID” as integer = 1809 And parameter “Pe10rid” as integer = 1803)) then “TRUE” else “FALSE”}"
Failed continue if {parameter “ImageReleaseID” as integer <= parameter “Pe10rid” as integer OR parameter “isAllowedException” = “TRUE”}
endif

Ran Fixlet 62: Deploy Windows Assessment and deployment Kit 10 and selected: WADK for Windows 10 release id 1809 (including WinPE AddOns) towards my Win7 Relay/Bare Metal Server
Completed //Check that the operating system is Windows 8/2012 or later
Failed continue if {version of operating system >= “6.2”}
OK, so Win7 isn’t supported…

My existing MDT bundle creator was on the BigFix server itself, but it did not show up as Relevant. Later I realised that this was because it expects WADK not to be already installed, i.e. it does not support upgrade to a newer WADK…
Relevance 6: NOT exists value “AdkInstallation” of keys “HKLM\SOFTWARE\Microsoft\WIMMount” of (if (x64 of operating system) then native registry else registry)

Booted up a VM with Win 10 Pro 1809 and agent installed

Ran Fixlet 46: Deploy MDT Bundle Creator against Win 10 machine
Ran Create MDT Bundle
Uploaded 1809 MDT Bundle from Win 10 machine

Capture Win10(1809) with WADK 1809/MDT 3.10.10: Bluescreen

A collegue of mine has seen the same bluescreen on occations, and has come around it by changing the “Hardware Compatibility” of the VM from 12x to 10x and the later setting it back to 12x. How he came up with that idea i don’t know, but I certainly had to try it, so here’s the log of trials and tribulations:

Capture after “Change Hardware Compatibility” to 10x fra 12x: Bluescreen
Capture after “Change Hardware Compatibility” to Workstation tilbake til 12x: OK!

Capture Win7 with WADK 1809/MDT 3.10.10: Bluescreen
Capture Win7 with WADK 1709/MDT 3.10.7: Bluescreen
Capture with WADK 1709/MDT 3.10.7 after “Change Hardware Compatibility” til Workstation 10x fra 12x: Bluescreen
Capture with WADK 1709/MDT 3.10.7 after “Change Hardware Compatibility” til Workstation tilbake til 12x: OK!

Capture Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: OK!
Capture Win7 with WADK 1709/MDT 3.10.7 @Hardware Compatibility 12.x: Bluescreen
Capture Win7 with WADK 1709/MDT 3.10.7 @Hardware Compatibility 10.x: Bluescreen
Capture Win7 with WADK 1709/MDT 3.10.7 @Hardware Compatibility 12.x: OK!
Capture Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: OK!
Capture Win7 with WADK1709/MDT 3.10.7 @Hardware Compatibility 12.x: OK!

Now, Deployment:

PXE-boot Win7 WADK 1709/MDT 3.10.7 @Hardware Compatibility 12.x: Bluescreen
PXE-boot Win7 WADK 1709/MDT 3.10.7 @Hardware Compatibility 10.x: Bluescreen
PXE-boot Win7 WADK 1709/MDT 3.10.7 @Hardware Compatibility 12.x: Bluescreen

Created new Bare Metal Profile with WADK 1809/MDT 3.10.10
PXE-boot Win7 (Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected): Bluescreen
PXE-boot Win7 WADK 1809/MDT 3.10.10 @Hardware Compatibility 10.x: Bluescreen (Binding menu showed up, selected new profile)

Created new OS MDT Resource for Win10 1809 and uploaded it
PXE-boot Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: Bluescreen
PXE-boot Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 10.x: Bluescreen
PXE-boot Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: Bluescreen

Upgraded Bare Metal OS Deployment server from 7.1.1.20.310.11 to 7.1.1.20.310.27

PXE-boot Win7 @Hardware Compatibility 12.x (Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected): Bluescreen
PXE-boot Win7 WADK 1709/MDT 3.10.7 @Hardware Compatibility 10.x: Bluescreen
PXE-boot Win7 @Hardware Compatibility 12.x: (Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected): Bluescreen

PXE-boot Win10(1809) @Hardware Compatibility 12.x (Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected): Bluescreen
PXE-boot Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 10.x: Bluescreen
PXE-boot Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: @Hardware Compatibility 12.x (Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected): Bluescreen

During all the PXE-boot testing, I sometimes saw the message “Please wait while we are searching and injecting in WinPE (id: xxxxxxxxx) drivers for this machine (model: VMware Workstation Guest)” and also “Error: a loop has been detected for activuity xxxxxxxxx when target was starting from network”. I guess that these are due to the BM server recognising the same machine trying to PXE-boot several times in a row and trying to be smart and sort out any driver problems. I also guess that this “intelligence” in the BM server are the cause of my observation: “Binding menu never shown in GUI at client, jumped to loading boot.wim after a few “waiting for next action” so I dont know which profile got selected”, it trying to be helpful and start off with what was terminated with the Bluescreen…

Just for the sake of it, I rounded off this session by testing capturing again:
Capture Win10(1809) with WADK 1809/MDT 3.10.10 @Hardware Compatibility 12.x: OK!
Capture Win7 with WADK1709/MDT 3.10.7 @Hardware Compatibility 12.x: Bluescreen
Capture Win7 with WADK1709/MDT 3.10.7 @Hardware Compatibility 10.x: OK!
Capture Win7 with WADK 1809/MDT 3.10 @Hardware Compatibility 12.x: Bluescreen

BTW, all VMs used for capture and deployment are non-UEFI.

Can anyone make any sense out of this?

Seems like the BSOD itself didnt make to original post, so here it is:

Hello,
it seems to me that the issue is more related to Hardware Compatibility that to BigFix OSD.
WinPE 10 1809 is based on Windows 10 1809 so, if it cannot run on or has issues on that virtual hardware, WinPE will have as well.
Is that Windows level supported on your version of VMWare workstation? Is there any issue reported?
Is the driver binding grid reporting right drivers bound on the virtual hardware?
Thanks.
Sergio Tarchi

Sergio,
According to VMware Compatibility Guide (VMware Compatibility Guide - System Search), Windows 10 is supported on VMware Workstation 12

I also run Windows 10 1809 VM guests installed from .iso media without any problems. It seems to me that it it the BigFix modified WinPE that has issues on VMWare workstation. Is VMWare workstation a supported (test-) platform?

As for drivers, I’ve always seen that VMware SATA AHCI Controller (15AD.07E0.15AD.07E0) driver is missing, but it has never caused any trouble before. The driver is not to be found anywhere on the internet either.

Any more ideas? I can’t be the only one in the BigFix world that have encountered this issue (I hope :wink:).

Regards,
Harald

I ran into the same problems trying to capture an 1809 image using bigfix. After 2 weeks of troubleshooting BSOD’s and nearly breaking my keyboard i decided to go a different route…

I installed ADK 10 v1809 and MDT 8450 on another workstation and used it to create a bootable ISO capable of capturing an 1809 image. Once that was done I was able to import that into Bigfix… I followed some instructions I found on https://deploymentresearch.com/Research/Post/1676/Building-a-Windows-10-v1809-reference-image-using-Microsoft-Deployment-Toolkit-MDT and they worked perfectly for me… The only issue I ran into was during the importing process to Bigfix, it complained that it didn’t have a “driverinfo” file so I just copied one from my 1803 image and named it accordingly and that worked…

I’ve been able to deploy 1809 now to about 100 computers and no issues…

jbennett1337,

So I’m not alone after all :sweat_smile:

Surely, this must be considered a major bug in BigFix OSD, I think I will raise a PMR with BigFix support.

Thank you for the reference to instructions for an alternative method, I will read through and consider that as an option, but preferrably BigFix OSD should be fixed.

Thanks,
Harald

Hi,
if I correctly understood and the behavior changes modifying the hardware compatibility back and forth it seems to me more an issue related to VMware workstation than to BigFix OSD.
A PMR can help to investigate.
Thanks.
Sergio Tarchi

As far as actually finding the driver, there are driver packages available in the VMWare Tools installation directory after you install them. I’ve previously used those for OSD driver imports.

Something definitalty changed between 1709 and 1803: After my opening post I’ve run a series of test @1803 and it shows the same problems that @1809, all @VMwarre compatibility level 12.x. The constant here is @VMwarre compatibility level 12.x, the variable is the BigFix MDT/WAIK version: 1803/1809.

I opened a “case” with IBM yesterday (seems it is not called PMR anymore…), and got a notice after 24 hours that they are looking in to the matter…

Regards,
Harald

The VMware drivers are “known” in my VMware Workstation BigFix environment. I’ve also followed these instructions to import all drivers: VMware Knowledge Base, but to no avail…

After upgrading VMware Workstation, from 12.x to 15.x, the problems with capture and PXE-boot of Win10 1809 are gone. For Win7 SP1, I must use Hardware Compatibility 9.x, but I can live with that.

Harald

1 Like

I ran into this same bluescreen message while capturing an 1809 image at the same point in sequence - after it syspreps, and restarts the machine to start the final step of capturing the .wim. I fought it for a few days and never came to resolution, but randomly it just worked yesterday. pretty frustrating. I am using VirtualBox, but I’ve done this same process several times in the past without issues. we should probably have our case numbers consolidated on the issue - TS001868353

Nick,

Yes, that happened to me as well. I was asked by IBM support to produce a video of the sequence, and then it just worked, several times in a row, before the dreaded BSOD appeared again.

My case number is TS001855363, feel free to ask for a consolidation.

Regards,

Harald

that’s how it happened for me, too. I had it fail 4 or 5 times in a reproduceable manner. then randomly while working the PMR to produce more evidence, it just goes through. >_<

the video recording request for troubleshooting is a bit insane to ask of a customer I think. I get that it helps them isolate further exactly where in the process it is failing for troubleshooting, but the time investment to generate that is pretty high.

Hi,
when the crash occurs, looking at descriptions provided, the target computer is running WinPE in memory, and there’s no connection to the bare metal server yet, so no way to have information from files left on the disk or from logs sent to the bare metal server. The video seems to be the only way to have some information more, apart from the description, on what can be investigated.
Thanks.

I asked support this question:
It seems to me that the Bare Metal Server is trying to be smart as I see messages like “Please wait while we are searching and injecting driver in PE”, “Error loop detected” and sometimes the bindig menu is bypassed and the loading of boot.wim starts directly. Is there a way to flush the BMS’s “memory” so that I can start from scratch each time I test?

I got this answer which prooved very ueful in my testing:
hello
to reset the target status:

  1. Log in on the bare metal server web interface (https://) with username and password provided at install time
  2. On the menu on the left, go to -> OS Deployment -> Target monitor
  3. In the computer list on the left, identify your target computer IP address, right click and “Reset status”

After a weekend of extensive testing with UEFI instead of BIOS, both for capture and deployment of Windows 7 SP1 and Windows 10 1809, I have not encountered any BSOD. That is good news, but there is a snag: on the first reboot after the “Install Operating System” action, the process get stuck in the “GUI” with this message at the bottom “No task to execute; waiting next action (boot time:” etc… To get the process to continue I have to hit Ctrl+Alt+Insert (VMware version of Ctrl+Alt+Delete) or power cycle the machine.

I’ve notified IBM support about this in my already opened case and asked if I will have to open a new case for this problem.

Harald