EC2 Boot Loop

- Your Technical Resource

EC2 Boot Loop

EC2 Boot Loop

Microsoft updates have been causing some EC2 Boot Loops, causing the loss of server availability. We have fought a lot of physical machines that enter reboot loops, loss of network connectivity and other things after an update. We have had a 100% track record in repairing Windows 10, Windows 7, Windows 8, and all Windows Server OS update issues on physical boxes so far. The December 2019 cumulative update from Microsoft was fun for some of our customers. This update, like the update in our last post is fixed by getting to safe mode and and rolling the update back. What if you can’t go to the box and push F8? Or don’t have a VM-ware server with root console access when the instance is not booting correctly? This is what happens when things go sideways with AWS-EC2.

Generally if an AWS server drops out you are left in the valley of darkness with no hope. You can right click on the instance setting, then screen view to see if there is any life in the server and what it might be doing if anything. When a server is gone most people drop the AWS server and restore it from backup. Our backups are usually 15 minutes behind real time, so data loss is minimal. Sometimes restoring is not always a viable option. In cases of these re-boot loops; old backups that are restored might also begin looping when restored to to EC2 depending on the state the patch was in at backup, or other issues come about. Here is one way to deal with re-booting nasty EC2 boxes. Build a rescue server, we keep one pre-built with rescue software and instruction on the desktop, but turned off so as to not pay for something that is rarely used. Here are the steps to recover an EC2 instance.

Instructions to Exit EC2 Boot Loop

  • We create an EC2 instance usually 2019, with 30GB hard drive. T3.large as our rebuild server. Do not place it in the domain. Domain security will be a fight if the server is offline too long.
  • Detached “C” drive ( /dev/sda1 ), from the ailing server then attached it to rescue instance.
  • In computer manager -> Disk admin, set the bad drive online and note drive letter it gets assigned.
  • Open a command prompt and go to drive letter
  • Type – bcdedit /set {bootmgr} displaybootmenu yes– (Response – said command successful)
  • Detach the drive from rescue server, reattach drive to original server with location being /dev/sda1. This makes it the root drive again.

Start instance- We have had good luck with this, when experiencing this issue.

Pro Tip: If this does not fix your issue. Try the AWS rescue trick again. This time download the AWS rescue utilities and follow the directions. The first trick we try at this point is the AWS – restore utility. Running that sets the server to last known good. then run the bcdedit command again. Transfer the drive back to the production server, and give it a try, it might be the trick. After this point the recovery work really begins. One other tip: If you have a separate AWS account available make an AMI from your server if you can. Share the AMI(modify image permission) to the other account and launch it as an instance. This way you can leave your current server intact as you find the solution. This might matter to someone, in a forensics investigation should more information be needed regarding the servers untimely demise.

There is also a rescue package to download that fixes some stuff automatically. We wish we could guarantee good results in all cases, but this I.T. There is only one guarantee in this industry “The fix is somewhere just past plan “Z””

Boot Loop Microsoft Azure: If you are struggling with this problem on Microsoft Azure follow this link, it may be helpful. So far, we have not repaired any Azure servers, but it does not appear to be great there either, based on our reading.

Microsoft products stink: MAC, Linux, Microsoft, are challenging. There is no out of the box; as computers are customized. We write custom software we place it on a million hardware combinations and expect it to be flawless. Then there are billions of little bots trying to find ways in to the computers. Keeping it all working is a real challenge for every computer manufacture. Linux and MAC are no longer obscure enough to be an exception to this rule.

As persons customizing business to be in a sweet niche is hard work, and expensive. We help with buying the best hardware / software solutions money can buy. We minimize downtime and to engineer solutions to make you the top performer in your playing field.

Though we like to hate on Microsoft every time a customer experiences an outage, and or we lose our holidays, nights, and weekends, we can’t even imagine what Microsoft has to do just to keep our operating systems viable, and make it so we have PC’s to come back to the next morning. The only secure operating system has never seen a computer. It is still in its creators mind.