OBJECTIVE: Building a Self-Healing Network
ISSUE: Need some script to Restart failed Windows Services by Nagios Client
SOLUTION:
I created and tested win_service_restart.cmd batch file on Microsoft Windows 2003 servers.
--- Start of code ---
@echo off 
:: *****************************************************************************
:: File: win_service_restart.cmd
:: Author: Vadims Zenins http://vadimszenins.blogspot.com
:: Version: 1.07
:: Date: 16/11/2010 12:28:45
:: Windows Failed Service restart batch file for Nagios Event Handler
::
:: Copy win_service_restart.cmd to \NSClient++\scripts\ folder.
::
:: Nagios commands.cfg:
:: define command{
:: command_name win_service_restart
:: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c win_service_restart -a "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
:: }
::
:: Nagios template-services_common-win.cfg
:: define service{
:: name generic-service-win-wuauserv
:: service_description wuauserv
:: display_name Automatic Updates
:: event_handler win_service_restart
:: event_handler_enabled 1
:: check_command check_nt!SERVICESTATE!-d SHOWALL -l $SERVICEDESC$
:: }
::
:: NSCLIENT++ version 0.3.8 NSC.ini:
:: [Settings]
:: allowed_hosts=192.168.1.1/32 ; your Nagios server IP
:: [NRPE]
:: allow_arguments=1
:: allow_nasty_meta_chars=1
:: [Script Wrappings]
:: cmd=scripts\%SCRIPT% %ARGS%
:: [External Script]
:: allow_arguments=1
:: allow_nasty_meta_chars=1
:: [External Scripts]
:: command[win_service_restart]=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
::
::
:: Additional examples on http://vadimszenins.blogspot.com/2008/12/nagios-restart-windows-failed-services.html
::
:: Tested platform:
:: Windows 2003 R2 x64 SP2, Nagios 3.2.0, NSClient++ 0.3.8.76
::
:: Version 1.07 revision:
:: Description is changed for NSCLIENT++ version 0.3.8 NSC.ini
:: Version 1.06 revision:
:: Description is changed for NSCLIENT++ version 0.3.8 NSC.ini
:: Version 1.05 revision:
:: Logging changes, stop and start services commands nave changed. Logs examples added.
:: Version 1.04 revision:
:: Double restart of the servise is fixed
:: Version 1.03 revision:
:: Description is changed
:: Version 1.02 revision:
:: @NET changed to @SC
:: Version 1.01 revision:
:: Service name's with spase problem is fixed
::
:: This code is made available as is, without warranty of any kind. The entire
:: risk of the use or the results from the use of this code remains with the user.
:: *****************************************************************************
::echo 1: %1 2: %2 3: %3 4: %4
@SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
:: Grab a file name and extension only
SET SCRIPTNAME=%~nx0
:: Replace "
SET SCRIPTNAME=%SCRIPTNAME:"=%
SET LOGDIR=C:\tools\logs
SET SERVICENAME=%1
:: Replace "
SET SERVICENAME1=%SERVICENAME:"=%
SET LOGFILE=%1
SET LOGFILE=%LOGFILE:"=%
:: Replace space by _
SET LOGFILE=%LOGFILE: =_%
SET LOGFILE=%LOGDIR%\%LOGFILE%.log
if "%SERVICENAME1%"=="" SET LOGFILE=%LOGDIR%\NO_SERVICENAME.log
::@echo servicename: %SERVICENAME%
::@echo logfile: %LOGFILE%
::@echo SERVICENAME1: %SERVICENAME1%
:: =============================================================================
if not exist %LOGDIR% md %LOGDIR%
echo. >>%LOGFILE%
echo ============================================================================= >>%LOGFILE%
echo %DATE% %TIME% %SCRIPTNAME% has started >>%LOGFILE%
echo ============================================================================= >>%LOGFILE%
@if "%SERVICENAME1%"=="" goto usage
@if "%SERVICENAME1%"=="/?" goto usage
@if "%SERVICENAME1%"=="-?" goto usage
@echo Variables 1: %1 2: %2 3: %3 4: %4 >>%LOGFILE%
@SC query %SERVICENAME% >>%LOGFILE%
@SC query %SERVICENAME% | FIND /I "RUNNING" >>%LOGFILE%
if .%ERRORLEVEL%.==.0. (
SET RETURN=Service %SERVICENAME% is running
goto END
)
:RESTART
@echo %DATE% %TIME% Restarting %SERVICENAME% services... >>%LOGFILE%
@SC stop %SERVICENAME% >>%LOGFILE% 2>&1
@sleep 2
SET RETURN=Service %SERVICENAME% start pending
@SC start %SERVICENAME% | FIND /I "FAILED"
if .%ERRORLEVEL%.==.0. (
SET RETURN=Start Service %SERVICENAME% FAILED
@SC start %SERVICENAME% >>%LOGFILE% 2>&1
goto END
)
@sleep 5
@SC query %SERVICENAME% | FIND /I "RUNNING"
if .%ERRORLEVEL%.==.0. (
SET RETURN=Service %SERVICENAME% has started
@SC query %SERVICENAME% >>%LOGFILE%
goto END
)
@goto end
:USAGE
@echo Usage: >>%LOGFILE%
@echo win_service_restart "^" ^ ^ ^ >>%LOGFILE% 
@echo ^is "Service name", do not mix with "Display name" >>%LOGFILE% 
@echo ^, ^ and ^ are optional >>%LOGFILE% 
::exit 128
:END
echo %DATE% %TIME% %SCRIPTNAME% has finished with code >>%LOGFILE%
echo %RETURN% >>%LOGFILE%
@echo %SCRIPTNAME%: %RETURN%
exit 0
--- End of code ---
Additional examples:
template-services_common-win.cfg
define service{
name generic-service-win-backup-agent
service_description BackupExecAgentAccelerator
display_name Backup Exec Remote Agent
event_handler win_service_restart
event_handler_enabled 1
check_command check_nt!SERVICESTATE!-d SHOWALL -l $SERVICEDESC$
register 0
}
services_common-win.cfg
define service{
use generic-service-win-backup-agent,generic-service-office
hostgroup_name winsrv-office ; Assign group of servers
host_name !SERVER11,!SERVER12 ; use this to exclude some servers or delete this row
}
group_windows.cfg
define hostgroup{
hostgroup_name winsrv-office ; The name of the hostgroup
alias Office Servers
}
host-server01.cfg
define host{
use windows-server ; Inherit default values from a template
host_name server01 ; The name we're giving to this host
alias server 01 ; A longer name associated with the host
hostgroups winsrv-office ; Group of servers
address 192.168.1.1 ; IP address of the host
}
DOWNLOADS:
Download the script latest version from mirror.
md5: dbf0663a9e6648886eb8015cee8c9ce0 *win_service_restart.zip
Download the script previous version 1.05 from mirror. md5: 8b90ba7654227f1bf07c694368843b9
Download the script previous version 1.04 from mirror. md5: fd00753533e5fb655d824c3bf1d36d4
Links: exchange.nagios.org
 
 
 
 Сообщения
Сообщения
 
 
34 комментария:
Incredible. Well done.
Hi Vadim!
Great job on the script but could you explain it a bit more please?
I've added the command to commands.cfg and edited the templates.cfg with the service template.
How do i call this service template?
for i.e. I'm checking several computers/servers on the mysql service, if it fails i need to automatically let nsclient++ restart the service with your win_service_restart.cmd
Can you work out an example or something?
Thanks in advance,
Kind regards,
Dennis de Vries
The Netherlands
Email: ultrac00l@hotmail.com
Thanks for this tip.
BTW, the script in my configuration is always executed twice : I think it's because event handlers are executed when there is a problem, but also when there is a recovery. Is it a problem in my configuration or maybe should you add a check in your script to make sure the service is down ?
Batch checks service status. Batch should restart service if service didn't start only.
1. Please post your log
2. Please check and post service's dependencies.
Regards,
Vadim
Apache crashed one time and gets restarted twice : see the second restart, it's happening whereas the status is OK
This is the log from the windows script :
======================
16.02.2009 8:18:37.62 SCRIPT is started
Variables 1: "Apache2" 2: CRITICAL 3: SOFT 4: 1
SERVICE_NAME: Apache2
TYPE : 10 WIN32_OWN_PROCESS
STATE : 1 STOPPED
(NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN))
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
16.02.2009 8:18:37.64 Restarting "Apache2" services...
The Apache2 service is starting.
The Apache2 service was started successfully.
16.02.2009 8:18:40.98 SCRIPT is finished
======================
16.02.2009 8:19:37.48 SCRIPT is started
Variables 1: "Apache2" 2: OK 3: SOFT 4: 2
SERVICE_NAME: Apache2
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN))
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
16.02.2009 8:19:37.50 Restarting "Apache2" services...
The Apache2 service is stopping......
The Apache2 service was stopped successfully.
The Apache2 service is starting.
The Apache2 service was started successfully.
16.02.2009 8:19:55.84 SCRIPT is finished
Thanks!
What are Windows version and Language?
Please post result of next command (Apache should run) from command prompt:
SC query Apache2
Regards,
Vadim
OS: Windows 2003 SP2 Standard Edition
Language : En
Result of the command SC query Apache2:
=================================
C:\Documents and Settings>SC query Apache2
SERVICE_NAME: Apache2
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN))
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\Documents and Settings>
=============================
Please find row:
@SC query %SERVICENAME% | FIND /I "STATE : 4" 2>>&1
Replace the row with next:
@SC query %SERVICENAME% | FIND /I "RUNNING" 2>>&1
Perfect ! It works ! I changed the two rows containing the "STATE : 4" string.
Thanks!
I am able to restart the Automatic Update service but not any others , is there a special trick ?
Hi Eugene,
No tricks.
logs of some restarted services on my servers:
15/07/2009 13:54:32.94 Restarting "BackupExecAgentAccelerator" services...
The Backup Exec Remote Agent for Windows Servers service is starting.
The Backup Exec Remote Agent for Windows Servers service was started successfully.
14/07/2010 6:01:19.32 Restarting "MSExchangeADTopology" services...
The Microsoft Exchange Active Directory Topology Service service is starting.
The Microsoft Exchange Active Directory Topology Service service was started successfully.
10/01/2010 11:43:48.96 Restarting "MSExchangeTransport" services...
The Microsoft Exchange Transport service is starting............................
The Microsoft Exchange Transport service was started successfully.
10/03/2010 10:43:29.17 Restarting "wuauserv" services...
The Automatic Updates service is starting.
The Automatic Updates service was started successfully.
10/01/2010 11:47:41.29 Restarting "Apache2.2" services...
The Apache2.2 service is starting.
The Apache2.2 service was started successfully.
Please copy/past your log
Regards,
Vadim Zenin
I am using nsclient++ version 3.8 latest one as of today, this is what I am seeing in nagios.log
2010-08-01 18:32:31: message:include\NSCHelper.cpp:238: No handler for command 'win_service_restart'.
I do not see your scripts log as specified in c:\tools\logs
Also latest nsclient does not have a portion
:: [NRPE Handlers]
:: command[win_service_restart]=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
I had to add it by hand
any ideas ?
Hi Eugene,
>I am using nsclient++ version 3.8 latest one
Yes, NSC.ini configuration has been changed a little bit. Please follow next description.
:: NSCLIENT++ version 0.3.8 NSC.ini:
:: [Settings]
:: allowed_hosts=192.168.1.1/32 ; your Nagios server IP
:: [NRPE]
:: allow_arguments=1
:: allow_nasty_meta_chars=1
:: [Script Wrappings]
:: cmd=scripts\%SCRIPT% %ARGS%
:: [External Scripts]
:: command[win_service_restart]=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
Regards,
Vadim
Light shining in the darkness.
Hi Vadim
I have followed all of your instructions but like Eugene is still get the same error everytime i try to call an Event Handler bound to a service
2010-08-01 18:32:31: message:include\NSCHelper.cpp:238: No handler for command 'win_service_restart'
Nagios seems to be working so i think it must be on the windows side.
I have tried it on W2K3 32 & 64.
Thanks
Thanks Vadim for this post. This is a great job.
I have been able to make it work. I use the command:
check_nrpe -H 192.168.1.120 -p 5666 -c win_service_restart -a adselfserviceplus
and the service restart as desired.
But something happened, I don't know what, but it stopped working. It seems there is a problem to pass argument. When doing the same command as above, I have the following error:
1: $ARG1$ 2: $ARG2$ 3: $ARG3$ 4: $ARG4$
servicename: $ARG1$
logfile: C:\tools\logs\$ARG1$.log
SERVICENAME1: $ARG1$
[SC] StartService: OpenService FAILED 1060:
win_service_restart.cmd: Start Service $ARG1$ FAILED
Do you have an idea what can be broken?
Thanks a lot,
Bernard
Hi,
Thanks for responce
Please check in NSC.ini file:
[NRPE]
allow_arguments=1
allow_nasty_meta_chars=1
[External Script]
allow_arguments=1
allow_nasty_meta_chars=1
[Script Wrappings]
cmd=scripts\%SCRIPT% %ARGS%
[External Scripts]
win_service_restart=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
Regards,
Vadim
Thanks again Vadims,
allow_arguments in section [External Script] was not set to 0.
Too simple.... Once we understand! ;-)
Bernard
I have read all the configs and posts etc. I now get the following message in my nsclient log.
nsclient.log
2010-08-26 15:43:43: message:NSClient++.cpp:1157: No handler for command: 'win_service_restart'
2010-08-26 15:43:43: message:include\NSCHelper.cpp:238: No handler for command 'win_service_restart'.
I have edited NSC.ini
[NRPE]
port=5666
allow_arguments=1
allow_nasty_meta_chars=1
[Script Wrappings]
cmd=scripts\%SCRIPT% %ARGS%
[External Scripts]
win_service_restart=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
I run/usr/local/nagios/libexec/check_nrpe -H x.x.x.x
and get "seem to be doing fine"
Please Help!
Hi Vadim
I have a problem with Services with a $ (MSSQL$ARCSERVE_DB).
Me impementation are different to yours. I have Nagios with Nagiosql.
The Services check with ::$USER1$/check_nt -H $HOSTADDRESS$ -v SERVICESTATE -l 'MSSQL$$ARCSERVE_DB' -s xxx -p 12489
I have definied a diplay_name
'MSSQL$$ARCSERVE_DB'
The handler are in a template
:: $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c win_service_restart -a "$SERVICEDISPLAYNAME$"
All another services work with the reboot. The Nagios show the Service right. But when the service is down, i become in the log from server this:
2010-09-15 09:02:51: debug:NSClient++.cpp:1142: Injected Result: OK 'OK: All services are in their appropriate state.'
2010-09-15 09:02:51: debug:NSClient++.cpp:1143: Injected Performance Result: ''
2010-09-15 09:02:53: debug:NSClient++.cpp:1106: Injecting: win_service_restart: MSSQLARCSERVE_DB
2010-09-15 09:02:53: debug:NSClient++.cpp:1142: Injected Result: OK ''sleep' is not recognized as an internal or external command,
operable program or batch file.
[SC] StartService: OpenService FAILED 1060:
win_service_restart.cmd: Start Service "MSSQLARCSERVE_DB" FAILED'
The problem is, that the absentee $. (MSSQL$ARCSERVE_DB)
Are you have an idea?
Kind regards,
Matthew
Switzerland
Vadims,
I can get the script to run on the remote windows box if I call/execute the command on the Nagios box from the command line. What is not happening is that when Nagios flags the service as critical (down) is does not execute the command.
Any ideas? Thanks in advance
command.cfg
define command{
command_name sysaid_server_service_restart
command_line $USER1$/check_npre -H $HOSTADDRESS$ -p 5666 -c sysaid_server_service_restart -t 60 #-a "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
}
nsc.ini
[NRPE Handlers]
command[sysaid_server_service_restart]=scripts\sysaid_server_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
;# COMMAND ALLOW NASTY META CHARS
; This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow_nasty_meta_chars=1
[Script Wrappings]
cmd=scripts\%SCRIPT% %ARGS%
[External Script]
;# COMMAND TIMEOUT
; This specifies the maximum number of seconds that the NRPE daemon will allow plug-ins to finish executing before killing them off.
command_timeout=60
;
;# COMMAND ARGUMENT PROCESSING
; This option determines whether or not the NRPE daemon will allow clients to specify arguments to commands that are executed.
allow_arguments=1
;
;# COMMAND ALLOW NASTY META CHARS
; This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow_nasty_meta_chars=1
[NRPE]
;# NRPE PORT NUMBER
; This is the port the NRPEListener.dll will listen to.
port=5666
;
;# COMMAND TIMEOUT
; This specifies the maximum number of seconds that the NRPE daemon will allow plug-ins to finish executing before killing them off.
command_timeout=60
;
;# COMMAND ARGUMENT PROCESSING
; This option determines whether or not the NRPE daemon will allow clients to specify arguments to commands that are executed.
allow_arguments=1
;
;# COMMAND ALLOW NASTY META CHARS
; This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow_nasty_meta_chars=1
;
;# USE SSL SOCKET
; This option controls if SSL should be used on the socket.
;use_ssl=0
;
;# BIND TO ADDRESS
; Allows you to bind server to a specific local address. This has to be a dotted ip adress not a hostname.
; Leaving this blank will bind to all avalible IP adresses.
; bind_to_address=
;
;# ALLOWED HOST ADDRESSES
; This is a comma-delimited list of IP address of hosts that are allowed to talk to NRPE deamon.
; If you leave this blank the global version will be used instead.
allowed_hosts=nagioshost
;
;# SCRIPT DIRECTORY
; All files in this directory will become check commands.
; *WARNING* This is undoubtedly dangerous so use with care!
script_dir=scripts\
;
;# SOCKET TIMEOUT
; Timeout when reading packets on incoming sockets. If the data has not arrived withint this time we will bail out.
socket_timeout=60
Hi Matthew,
Thanks for your question.
Have you allowed meta chars?
Please check in NSC.ini file:
[NRPE]
allow_nasty_meta_chars=1
[External Script]
allow_nasty_meta_chars=1
Regards,
Vadim
This in response to the post on 15 September, 2010 18:09
I figured out my error. Typo in the command.cfg file. All that time I had check_npre instead of check_nrpe. Uggg. Funny - sort of.
Syntax, syntax, syntax!
There is a minor flaw in script because it will not check/restart a windows service if the service name has spaces in it. To correct this I had to put quotes in several different places around %servicename% where the command SC was present. I would have posted my script but was limited by the characters input to the blog.
Example:
SET SERVICENAME=%SERVICENAME:my service that has a space in it%
@SC stop "%SERVICENAME%"
BTW, thanks for the the initial script. This is awesome as is saves me from having to restart an occasional failed service.
IS VERY GOOD
Very good once you get it going. I had to make a change to the NSC.ini file to get handler to work:
I had to change [External Scripts]
to [NRPE Handlers].
I hope this saves someone a little bit of time.
I am using NSClient++ 0.3.8 x64
Hi Vadim,
below is my nsc.ini setup as you have described. I have stopped the wuauserver manually to test it. But it doesn't start if I force a check over the nagios.
And this is what I get on the nsclient screen:
d \NSClientListener.cpp(146) Data: nagios&5&ShowAll&wuauserv
d \NSClientListener.cpp(171) Data: ShowAll&wuauserv
d NSClient++.cpp(1106) Injecting: checkServiceState: ShowAll, wuauserv,
d NSClient++.cpp(1142) Injected Result: CRITICAL 'wuauserv: Stopped'
ini setup
[NRPE]
allow_arguments=1
allow_nasty_meta_chars=1
[External Script]
allow_arguments=1
allow_nasty_meta_chars=1
[Script Wrappings]
cmd=scripts\%SCRIPT% %ARGS%
[External Scripts]
win_service_restart=scripts\win_service_restart.cmd "$ARG1$" $ARG2$ $ARG3$ $ARG4$
Vadim,
Great script, thanks!
I have made a minor modification to the cmd file to support spaces in Service names.
IE, in Windows, the service is named - "Zabbix Agent".
In the nagios service definition it has had to be created as Zabbix\ Agent (otherwise it only passes Zabbix).
Modification (line 74 onwards) -
SET SERVICENAME=%1
:: Replace "
SET SERVICENAME=%SERVICENAME:\=%
SET SERVICENAME1=%SERVICENAME:"=%
SET LOGFILE=%SERVICENAME1%
SET LOGFILE=%LOGFILE:"=%
Cheers!
Running into an issue of my own...
[root@951e7136b etc]# /usr/local/nagios/libexec/check_nrpe -H 10.1.1.1 -p 5666 -c restartsvc -a "SNMP"
Request contained arguments (not currently allowed, check the allow arguments option).
http://paste2.org/p/2199075
Tried this config..
Just can't get it to work. If I run restartsvc with nscp test running, it restarts the service (with arguments) perfectly.. ARGH!
Any ideas?
Hi Doug,
Have you included
[NRPE]
allow_arguments=1
allow_nasty_meta_chars=1
in .ini file?
Hi Vadims.
How can I configure the plugin to try restart the services more than once?
Thank you in advance
Ariel
Ari_plane@yahoo.com.ar
Hi Vadim,
I'Have Nagios 3.2.0 without nrpe
How can restart remote Windows computer with your script ?
Thanks
Regards
Thierry
Hi Vadim,
I'Have Nagios 3.2.0 without nrpe
How can restart remote Windows computer with your script ?
Thanks
Regards
Thierry
Hi Thierry,
Te script "Restart a Windows Failed SERVICES" restarts Windows service only.
Regards,
Vadim
The plugin requires NSCLIENT++ client with NRPE
Отправить комментарий