How to use the hardware watchdog in our new RevPi Connect
Posted: 09 Aug 2018, 00:13
You may have realized that our new RevPi Connect is equipped with a hardware watchdog. How does it work, what is it good for and how should you operate it?
Here are the answers:
How does it work?
We have built in a MAX6370 chip and hardwired it to 60 seconds cycle time. The function of this chip is pretty easy: You need to toggle the WDI input pin at least once every 60 seconds to prevent that WDO output is triggered. Whenever you toggle the WDI pin this does a reset of the internal WD timer. If this timer is not reset and runs to 60 seconds it will trigger the output. We have connected the ouput to a mono-flop which generates a 1 second pulse. This pulse needs to pass an AND gate where it is combined with the WD enable signal from the external WD pin (if connected to GND the WDO signal is blocked at this gate). The signal can pass another AND gate to set the relays (shutting off the normally closed contact on X2 connector). A blocking or enabling signal from a GPIO of the FTDI chip is used to control this gate (you need to use a commandlins script to set or reset this GPIO). Thus the WDO signal 1 second pulse can also be used to break the power supply line of any external devices like GSM routers etc. The signal of the output from the first AND gate is used to disable the 24 to 5 V DC DC converter for the Compute Module's power supply for 1 second - thus forcing it to a cold start.
What is it good for ?
Any watchdog which needs a configuration by software (during boot time) is unsafe: If the configuration would not get to the desired result the WD would not function as desired. Therefore a pure hardware solution is the most reliable way to monitor a cyclical process. The process needs to toggle the WDI pin every cycle. If the process crashes the system is forced to a cold start which mos likely will re-establish functionality. Therefore such a WD is perfect for remote systems which can't be easily reset by an operator. Please note that resetting such a system remotely would only be possible if the remote access is still working although the cyclical process has crashed. This is often not the case as a buggy software might block the whole system.
How should you operate it?
If you have look at the aim you will see the answer: If the aim is to always have remote access (e.g. over the internet) you need to monitor cyclically the availability of your remote connection and stop toggling WDI whenever you have lost the connection to your server or cloud. So use e.g. a cyclical ping to find out if your server is connected. We have e.g. combined this ping monitor with monitoring other functions which need to run (in our case a cyclical running python script). Only if both functions (ping and python script) are running okay we toggle the WDI pin.
Please note that there is no deeper sense in using a small script which does nothing else but cyclically toggling the WDI. Such a script would not monitor anything else but its own status which would only show you a complete stuck system. So always be more specific. The higher the monitor level is the more specific is your control of the system. The lower (closer to the OS) your monitor level is, the more unreliable is a cold start in case of a specific malfunction of your system. You could of course use such a simple script during debug time to avoid unwanted resets. But you can easily achieve the same result by using the external wire bridge at X4 connector (set the bridge to disable the WD during development and debugging, get rid of the bridge when you are ready for productive state).
Here are the answers:
How does it work?
We have built in a MAX6370 chip and hardwired it to 60 seconds cycle time. The function of this chip is pretty easy: You need to toggle the WDI input pin at least once every 60 seconds to prevent that WDO output is triggered. Whenever you toggle the WDI pin this does a reset of the internal WD timer. If this timer is not reset and runs to 60 seconds it will trigger the output. We have connected the ouput to a mono-flop which generates a 1 second pulse. This pulse needs to pass an AND gate where it is combined with the WD enable signal from the external WD pin (if connected to GND the WDO signal is blocked at this gate). The signal can pass another AND gate to set the relays (shutting off the normally closed contact on X2 connector). A blocking or enabling signal from a GPIO of the FTDI chip is used to control this gate (you need to use a commandlins script to set or reset this GPIO). Thus the WDO signal 1 second pulse can also be used to break the power supply line of any external devices like GSM routers etc. The signal of the output from the first AND gate is used to disable the 24 to 5 V DC DC converter for the Compute Module's power supply for 1 second - thus forcing it to a cold start.
What is it good for ?
Any watchdog which needs a configuration by software (during boot time) is unsafe: If the configuration would not get to the desired result the WD would not function as desired. Therefore a pure hardware solution is the most reliable way to monitor a cyclical process. The process needs to toggle the WDI pin every cycle. If the process crashes the system is forced to a cold start which mos likely will re-establish functionality. Therefore such a WD is perfect for remote systems which can't be easily reset by an operator. Please note that resetting such a system remotely would only be possible if the remote access is still working although the cyclical process has crashed. This is often not the case as a buggy software might block the whole system.
How should you operate it?
If you have look at the aim you will see the answer: If the aim is to always have remote access (e.g. over the internet) you need to monitor cyclically the availability of your remote connection and stop toggling WDI whenever you have lost the connection to your server or cloud. So use e.g. a cyclical ping to find out if your server is connected. We have e.g. combined this ping monitor with monitoring other functions which need to run (in our case a cyclical running python script). Only if both functions (ping and python script) are running okay we toggle the WDI pin.
Please note that there is no deeper sense in using a small script which does nothing else but cyclically toggling the WDI. Such a script would not monitor anything else but its own status which would only show you a complete stuck system. So always be more specific. The higher the monitor level is the more specific is your control of the system. The lower (closer to the OS) your monitor level is, the more unreliable is a cold start in case of a specific malfunction of your system. You could of course use such a simple script during debug time to avoid unwanted resets. But you can easily achieve the same result by using the external wire bridge at X4 connector (set the bridge to disable the WD during development and debugging, get rid of the bridge when you are ready for productive state).