Friday, 4 May 2012

How to recover from a BSOD (blue screen of death)

This tutorial is to deal with BSOD codes and how to troubleshoot them. With some research and a better understanding of BSOD codes you will be able to help fix up a series of problems, some that may prevent you from cleaning up a PC from malware, or to understand the deeper lying conditions of an unstable machine.

Repeated BSODs can cause a lot of problems to a machine, and are a source of deep frustration to a user. You will see cases where a user is unable to run an anti-virus scan or some of our tools without getting a BSOD, this has an effect on our ability to remove malware. Having a better understanding of the problems that plague a machine, rather than only concentrating on malware problems, will help you become a better staff member too ! 




Part 1 : Logs

When dealing with any computer problem, its always a good idea to get whatever logs possible that can help you fix it. For dealing with BSODs you want to get the following
  • The Windows STOP message, aka the BSOD code.
    eg : 0x000000EA or 0x000000E6
  • The users minidump files
    eg : any files in C:\Windows\MiniDump
  • Event Viewer logs



A brief explanation of these three features is below


Windows STOP messages 

These occur when something has forced Windows to stop ( obviously ). A lot of the time this can be down to hardware issues. You will often see BSODs occurring after running particularly strong malware removal tools like GMER or ComboFix. STOP messages are identified by an 8-digit hexadecimal number, but also commonly written in a shorthand 
notation; e.g., a STOP 0x0000000A may also be written Stop 0xA


Minidump files

Every time a BSOD occurs Windows will save information regarding the error message in a log file. This log file or
minidump file is saved in C:\Windows\MiniDump. This allows you to easily find out when and why a BSOD occurred. However, the minidump file is not saved in a text format, so if you try to open the file in a text editor like Notepad you won't be able to decipher the information and understand it. To analyze these files you need to use a program like BlueScreenView. More on that later


Event Viewer logs

The Event Log Service records application, security, and system events. With the event logs in Event Viewer, you can obtain information about your hardware, software, and system components, and monitor security events on a local or remote computer. Event logs can help you identify and diagnose the source of current system problems, or help 
you predict potential system problems. 

While minidump files are only created when a BSOD occurs, this is not the case for EV Logs. 


Now that we have got the three main areas to check when a BSOD occurs, lets move onto analysis.



Part 2 : Analysis

For the tutorials sake, lets assume we have the users Windows STOP message, minidump files, and EV logs. Now what ?

1) First to deal with is the STOP message. You will need to get your user to write down the BSOD details when it happens. You will then want to bring up these pages which explain the majority of BSOD codes

http://aumha.org/a/stop.htm
http://support.micro...m/search/?adv=1


Lets say your user has given you this information

PAGE_FAULT_IN_NONPAGED_AREA
Stop:0x00000050 (0xFF5AFFF8,0x00000000,0x80544A9D,0x00000000)


If you search for that in the above aumha link you will find the following

Quote
Requested data was not in memory. An invalid system memory address was referenced. Defective memory
(including main memory, L2 RAM cache, video RAM) or incompatible software (including remote control and
antivirus software) might cause this Stop message, as may other hardware problems (e.g., incorrect SCSI
termination or a flawed PCI card).



A good question to ask your user would be if he has installed any hardware or software recently.


The BSOD code has helped, but you always need to dig deeper so you can see the whole picture. While it may be likely that the above problem is hardware related, there is always the chance it isn't. Just like how we don't rely on HijackThis as the only log needed in malware removal, nor should you rely only on a BSOD code.


Note : When a BSOD occurs you should look to see if there is a file name listed to go along with the stop code and it's parameters. There isn't always one listed, but when there is it can pin-point the problem area with a quick search of the file name.




2) The next thing you want to get are the minidump files. There are two ways to go about this

Firstly, you can have the user zip and upload all his minidump files from C:\Windows\MiniDump. You then need the program BlueScreenView to "view" these. To do this you save the users minidumps anywhere on your own machine, open BlueScreenView, click Options > Advanced Options > navigate to the users minidump files ( lets say they are on your desktop ) > Click ok

You will now have an interface to analyze the users minidump files. 



To show you an example using this method, here is the output from one minidump log

==================================================
Dump File : Mini021510-02.dmp
Crash Time : 2/15/2010 17:56:46
Bug Check String : THREAD_STUCK_IN_DEVICE_DRIVER
Bug Check Code : 0x100000ea

Parameter 1 : 0x89075750
Parameter 2 : 0x89941e18
Parameter 3 : 0xb3bb2cbc
Parameter 4 : 0x00000001
Caused By Driver : nv4_mini.sys
Caused By Address : nv4_mini.sys+c9be2
File Description : 
Product Name : 
Company : 
File Version : 
Processor : 32-bit
Computer Name : 
Full Path : C:\Documents and Settings\Marko\Desktop\minidump\Mini021510-02.dmp
Processors Count : 4
Major Version : 15
Minor Version : 2600
==================================================


The key things are bolded above

Crash Time will help you narrow down what minidump files you want to analyse. A machine that may experience regular BSODs can have a lot of minidump files. If you know the approximate date of the users problems you can use this to reduce the amount of analysis required. For example, if a user was experiencing constant BSODing for the past week, there would be no point in getting minidump files from months or years ago.

Bug Check String can be considered the "name" of the BSOD code. 

Bug Check Code is the Windows STOP Message number, aka the BSOD code. On this users machine it is 0x100000ea


If you search for the Bug Check Code and String at the aumha link below

http://aumha.org/a/stop.htm

You will see that this BSOD Code is explained as the following

Quote
A device driver problem has caused the system to pause indefinitely (hang). Typically, this is caused by
a display driver waiting for the video hardware to enter an idle state. This might indicate a hardware problem
with the video adapter, or a faulty video driver.



Caused By Driver is clearly the most important area. This is the driver that probably caused the crash. 



3) The Event Viewer logs

There are a few ways you can get EV logs, you can get them through Windows, however it may be more preferable to use something like VEW which gives you a wider variety of features, and thus a better choice of EV logs to choose from. 

http://images.malwar...om/vino/VEW.exe



EV logs are definitely easier to understand than the other two. You will see a wide range of things here, whether it is
Windows informing the user that the latest update failed to install, or the users AV flagging something as malware. In the example below you can see that the users internet explorer has hung up on them


Error - 4/2/2010 8:38:53 AM | Computer Name = YOUR-6194D6D7F5 | Source = Application Hang | ID = 1002
Description = Hanging application iexplore.exe, version 8.0.6001.18702, hang module
hungapp, version 0.0.0.0, hang address 0x00000000.


Nothing terribly exciting about that, it can happen. But if the user claims this happens a lot to them and there are a series of these EV logs, then it would be possible to deduce that an add-on or extension in internet explorer is responsible for the crashes. The main use of EV logs are to support your thoughts after you have analyzed the users BSOD code and minidump files. If you have got the users minidump files for the past week to analyze, then you will want to get the last week of EV logs too. This will help pinpoint the problem, or show you what could have caused it. 



Part 3 : Solution

Now that we have got all the information relating to your users BSOD, its time to try and fix it so that they wont be
continuously plagued with this problem. The following link contains some troubleshooting methods for BSOD codes

http://aumha.org/a/stop.php#general

However it can be a complicated read so I will break it down here even more.


The main caveat to be aware of for finding a solution, is to try the simple methods first. Some of the fixes for BSODs can be very complicated, and potentially dangerous in the hands of an uneducated user, so rather than throw the kitchen sink at something from the start, use the simpler methods, that do work in most cases. They are the following
  • clean up any malware on the machine
  • run sfc /scannow
  • run chkdsk /r
  • do a windows repair
  • If you’ve recently added new hardware, remove it.

Those steps should fix most problems, test to see if this is the case, if not you need to move onto the more complicated ones.

  • Run hardware diagnostics supplied by the manufacturer.
  • Make sure device drivers and system BIOS are up-to-date. Updating the BIOS requires you to flash it using some boot disk and well planned steps 
  • If you’ve installed new drivers just before the problem appeared, try rolling them back to the older ones.
  • Open the box and make sure all hardware is correctly installed, well seated, and solidly connected.
  • Confirm that all of your hardware is on the Hardware Compatibility List. If some of it isn’t, then pay particular attention to the non-HCL hardware in your troubleshooting.
  • Investigate recently added software.
  • Examine (and try disabling) BIOS memory options such as caching or shadowing.


Some of these steps are too complicated for certain users, so be aware of this before you recommend any. To quote one of our most esteemed techs, Artellos

Quote
I always take the BIOS update as something to do last.




Part 4 : Turning on logs

In some cases you will find that the Event Viewer and Minidump are turned off. This can be down to users preference or caused by malware. Of course if these are turned off then its going to be rather hard to analyse and fix the users problem. You can do it manually by enabling it via services.msc


To turn on minidumps do the following

Go to the Control Panel and follow these steps:

1. Click the System Icon
2. Advanced Tab
3. Startup and Recovery -> Settings
4. Enable Write an Event to the system log
5. Disable Automatically Restart
6. Select the following debugging information:
  • Small memory dump (64 Kb)
  • Small Dump Directory : %SystemRoot%\Minidump

7. Confirm all and restart the computer. 


Now that you have these two key areas turned back on, all you need to do is wait for the users next crash so you can analyse the logs. 



Part 5 : Conclusion

While this tutorial could be considered "tech related", its aim is to increase the overall knowledge for all users. Troubleshooting problems like this can be intimidating, but its like everything else, its all about your level of experience and research. Hopefully now more people will be more comfortable trying to troubleshoot BSOD codes rather than feel they have to send a user to another part of the forum. 

Here are some good reference links that are worth reading

STOP Codes

http://aumha.org/a/stop.htm

General troubleshooting

http://aumha.org/a/stop.php#general

A download link for BlueScreenView and more information about how to use it

http://www.nirsoft.n...creen_view.html

Microsoft Search Index ( the best site for researching BSOD stop codes )


http://support.microsoft.com/search/?adv=1

No comments:

Post a Comment