cpqhealth(4)                                                      cpqhealth(4)


NAME
       cpqhealth - hp ProLiant Advanced System Management Driver

SYNOPSIS
       /etc/init.d/cpqasm [start | stop | status]


DESCRIPTION
       The hp ProLiant Advanced Server Management Driver collects and monitors
       important operational data on your server to ensure that the system  is
       operating  nominally.   Any abnormal conditions are logged into the Non
       Volatile RAM (NVRAM) Integrated Management Log (IML).

       ProLiant Servers are equipped with hardware  and  firmware  to  monitor
       certain  abnormal conditions such as abnormal temperature readings, fan
       failures, ECC memory errors, etc.  The cpqhealth driver monitors  these
       conditions and reports it to the administrator by printing a message on
       the console, and also logging the condition  into  the  ProLiant  Inte-
       grated  Management Log (IML).  The Insight Manager 7 agents can also be
       used to notify the administrator of abnormal conditions.

       The following is a list  of  features  supported  by  the  hp  ProLiant
       Advanced System Management Driver:

       Monitoring abnormal temperature conditions
              If  the  normal  operating temperature is exceeded, or a cooling
              fan fails, the hp ProLiant  Advanced  Server  Management  driver
              does the following;

       *      Displays a message to the console stating the problem

       *      Makes an entry in the Integrated Management Log (IML).

       *      Shuts  the  system  down  (optionally) to avoid hardware damage.
              Use hp ProLiant ROM Based Setup (System  Configuration)  Utility
              (RBSU) to control the option.

        Monitoring fan failures
              If  a cooling fan fails, the hp ProLiant Advanced Server Manage-
              ment driver does the following:

       *      Displays a message to the console stating the problem

       *      Makes an entry in the Integrated Management Log (IML).

       *      Shuts the system down (optionally)  to  avoid  hardware  damage.
              Use  hp  ProLiant ROM Based Setup (System Configuration) Utility
              (RBSU) to control the option.

        Monitoring the system Fault Tolerant Power Supply
              If the primary power  supply  fails,  the  system  automatically
              switches  over  to  a  backup  power  supply.   The  hp ProLiant
              Advanced System Management driver does the following:

       *      Displays a message to the console stating the problem.

       *      Makes an entry in the Integrated Management Log (IML).

        Monitoring ECC memory errors
              If an ECC memory error occurs, the hp ProLiant  Advanced  System
              Management driver logs the error in the health log including the
              error causing address.  If too many errors  occur  at  the  same
              memory location, the driver disables the ECC error interrupts to
              prevent flooding the console from warnings (the  hardware  auto-
              matically corrects the ECC error).

        Automatic Server Recovery (ASR)
              The Automatic Server Recovery is implemented using a "heartbeat"
              timer that  continually  counts  down.   The  driver  frequently
              reloads  the  counter  to prevent it from counting down to zero.
              If the ASR counts down to 0, it is assumed  that  the  operating
              system  is  locked  up  and the system automatically attempts to
              reboot.  Events which may contribute  to  the  operating  system
              locking up include:

       *      A  peripheral  device  (such as a PCI adapter) failing in such a
              way that numerous spurious interrupts are generated.

       *      A high priority software application consumes all the  available
              CPU  cycles and does not allow the operating system scheduler to
              run the ASR timer reset process.

       *      A software or kernel application consumes all  available  memory
              including  the virtual memory space (i.e. swap).  This may cause
              the operating system scheduler to cease functioning.

       *      A critical operating system component  such  as  a  file  system
              fails  and  causes the operating system scheduler to cease func-
              tioning.

       *      Any other event besides an ASR timeout which causes a  Non-Mask-
              able Interrupt (NMI) to be generated.


       The ProLiant ASR feature is a hardware based timer.  If a true hardware
       failure occurs, the ProLiant Advanced Server  Management  driver  might
       not  be  called but the server will be reset as if the power switch was
       pressed.  The ProLiant ROM code may log an event to the ProLiant  Inte-
       grated Management Log (IML) when the server reboots.

       The  ProLiant  Advanced Server Management driver is notified via a Non-
       Maskable Interrupt (NMI).  If possible, the driver will attempt to per-
       form the following actions:

       *      Displays a message on the console stating the problem

       *      Makes  an entry in the ProLiant Integrated Management Log (IML).

       *      Attempts to gracefully shutdown the operating  system  to  close
              the file systems.

       There  is  not  a  guarantee  that the operating system will gracefully
       shutdown. This depends on the type (software or hwardware) and severity
       of  the  error condition.  There is more information about the ProLiant
       Advanced Server Recovery (ASR) feature later on in this document.

Getting the status of the ProLiant Server.
       There are multiple ways to get the operational status of  the  ProLiant
       server.   The ideal way is to load the Insight Manager 7 agents and use
       a tool such as HP OpenView or Insight Manager 7 to monitor  the  status
       of all the ProLiant servers.  For those customers who do not have auto-
       matic monitoring tools, the servers can be checked using a standard Web
       browser  as  long  as the Insight Manager 7 agents have been installed.
       The Insight Manager 7 Web Agent responds to  port  2301  and  2381  (if
       browser  supports  SSL  encryption).  For example, the browser could be
       pointed to:   http://192.1.1.20:2301  or  http://localhost:2301.   Note
       that  the  "http://" is required.  Until the agents are customized, the
       user name and password are both "administrator".

       The Insight Manager 7 Web Agent allows the  administrator  to  remotely
       view  the  IML  log  and  individual feature (i.e. temperature) status.
       Other ProLiant Server specific  information  is  also  available.   The
       Insight Manager 7 Agents may currently be obtained at:

       www.compaq.com/support/files/server/us/

       This  link may change in the future as a result of the HP - Compaq con-
       solidation.

       The UID (blue) Light utility (/sbin/hpuid)

       There is a utility, /sbin/hpuid, which allows a user to:

       *      Turn on the UID (blue) light

       *      Turn off the UID (blue) light

       *      Get the status of the UID (blue) light

       You must be logged on as the "root" user. You can just enter "hpuid" at
       the  command  line  prompt to get the parameter definition. There is an
       example script of how to use the hpuid  utility  located  in  /opt/com-
       paq/cpqhealth/hpuid_example.sh.  Note  that the UID light is not avail-
       able on all ProLiant servers.


       The "/proc" file system entries

       There are also "/proc" file entries available to allow quick checks  to
       be made.

       *      "/proc/cpqtemp"  shows the current temperature and the threshold
              levels of all temperature sensors.

       *      "/proc/cpqfan" shows the current status of all fans.

       *      "/proc/cpqpwr" shows the current status of all power supplies.

       There is a graphical maintenance utility  named  cpqimlview  (8).   The
       cpqimlview utility can be run in the graphical (X11) interface for full
       functionality or  a limited text based (ncurses) version  is  available
       for  use on "Blade" servers or Telnet sessions.  The cpqimlview utility
       will automatically start the correct IML viewer based on  the  terminal
       type.  See the man page on the cpqimlview (8) utility for more informa-
       tion.

       Most errors which are logged to the NVRAM based  Integrated  Management
       Log  (IML)  are  also  logged  to  the  standard  "messages" file (i.e.
       /var/log/messages).

Installing on patched Linux Kernels and remote deployment
       The cpqhealth driver has been designed to work with patched Linux  ker-
       nels.  There is a single source file which can compile and link against
       the patched Linux kernel sources.  Additionally,  a  shell  script  has
       been  provided to aid in the packaging of the driver into a new RPM for
       remote deployment.  This was done to allow customers to build once  and
       deploy  many times to servers which may not have build tools available.

       If the server has the build tools and the source files for the  patched
       Linux  kernel,  the  boot  time  scripts  will automatically attempt to
       rebuild the driver and install it.  Errors will  be  displayed  on  the
       screen  and  logged  to /opt/compaq/cpqhealth/cpqhealth_boot.log if the
       driver can not be built on a patched Linux kernel.

       The requirements to build and deploy are:

       The sources for the "Patched" Linux kernel must be loaded
              The sources for the "Patched" Linux kernel must be loaded on the
              system.

       The build environment must be properly created
              The  build  scripts  provided expect a standard Linux 2.4 kernel
              build  environment.   The  sources  should  be  linked  to   the
              "/lib/modules/`uname  -r`/build" directory.  The command "ls -ld
              /lib/modules/`uname -r`/build" should point to where the patched
              Linux kernel sources were loaded.

              Additionally,  the standard build tools such as the gcc compiler
              and make must also be loaded.

       To create a custom cpqhealth RPM package, perform the following steps:

       *      Load the patched Linux kernel sources and development tools.

       *      Make sure that a directory which corresponds to  the  output  of
              "uname -r" exists in the "/lib/modules" directory.

       *      Make  sure that a link named "build" in the "/lib/modules/`uname
              -r`/" directory points to the correct kernel  source  directory.
              You  can  validate  this by making sure that the file "/lib/mod-
              ules/`uname  -r`/build/include/linux/version.h"  has  a  version
              which matches the output of the "uname -r" command.

       *      If  all of the above conditions are met, you are ready to build.

              Run the shell script "sh custom_cpqhealth.sh".  This script will
              create  a new RPM SPEC file and attempt to build the driver.  If
              the build of the driver is successful, a new RPM package will be
              created and copied to the /opt/compaq/cpqhealth directory.  This
              package can then be deployed in the usual way.

       *      Typical errors

              Usually there will be compiler or linker warnings which indicate
              that  kernel  drivers should not use regular header files.  This
              is an indication that the kernel sources are not loaded  or  not
              installed correctly.  You need to make sure the "version.h" file
              listed above matches the output of "uname -r".  The  initializa-
              tion  script  "/etc/init.d/cpqasm"  does  check  to  see  if the
              "/lib/modules/`uname   -r`/build/include/linux/version.h"   file
              exists and matches the output of "uname -r".

       If you are building a custom kernel, you are responsible to making sure
       the correct "version.h" file is created and located in the correct ker-
       nel     header     file     directory     (i.e.    "/lib/modules/`uname
       -r`/build/include/linux/version.h").

       If all the above conditions are met, you might want to try building the
       kernel.   If the kernel can not be built successfully, this is an indi-
       cation that the particular kernel release may have some issues.


cpqhealth error messages
       The next few sections of this document is dedicated to error  messages.
       The  best  way to locate a particular message is to search for it using
       the "/" key.  The searches are case  sensitive  and  require  an  exact
       match.   The  errors  are  catorgized into installation, compatibility,
       general, temperature,  fan,  power  supply,  memory,  automatic  server
       recovery and critical server type errors.


cpqhealth installation messages
       The    following    message   will   be   logged   in   the   /opt/com-
       paq/cpqhealth/cpqhealth_boot.log        or        the         /opt/com-
       paq/cpqhealth/cpqhealth_boot.log.old.   These  messages  are logged for
       the RPM installation of the cpqhealth module as well  as  when  booting
       the Linux operating system.

       The  "/var/log/messages",  the  output  from  "dmesg",  the  "/opt/com-
       paq/cpqhealth/cpqhealth_boot.log"       and       the        "/opt/com-
       paq/cpqhealth/cpqhealth_boot.log.old"  files should always be sent with
       any queries concerning installation or rebuilding issues.

       Message:
              "WARNING: cpqasm: casmd already running!"
              "         You must stop the process first."
              "         usage:  /etc/init.d/cpqasm stop"


       Description:
              This is an indication that the /etc/init.d/cpqasm  script
              was run multiple times with the "start" parameter.

       Action:
              None.

       ============================

       Message:
              "The hp ProLiant Event Logging module is not available"
              "for this Linux kernel:  ${THIS_KERNEL}"

       Description:
              This  is  an  indication that the package has either been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.

       ============================

       Message:
              "The hp ProLiant Event Logging module failed to load!"
              "Linux  Kernel  Symbol  Conflict  - Attempting rebuild to
              resolve

       Description:
              This is an indication that the package  has  either  been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make sure  the  correct  distribution  package  has  been
              installed.   If  there is not a cpqhealth package for the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.  If the correct Linux kernel source is present,
              the  boot  scripts  will attempt to automatically rebuild
              the cpqhealth module and reload the drivers.  There  will
              be  a  message  at  the end of the rebuild process if the
              modules load successfully.

       ============================

       Message:
              "WARNING!  Not able to rebuild  the  cpqevt.o  module  on
              this kernel!"
              "           See  /opt/compaq/cpqhealth/cpqhealth_boot.log
              for details."

       Description:
              This is an indication that the package  has  either  been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has  been  installed.
              In  either  case,  there was no source available or there
              were compilation / linker errors.

       Action:
              Make sure  the  correct  distribution  package  has  been
              installed.   If  there is not a cpqhealth package for the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.     The    errors    located    in    /opt/com-
              paq/cpqhealth/cpqhealth_boot.log will need to be reviewed
              and corrected.  This may require "wrapper"  file  changes
              if  the  Linux  kernel header files have been drastically
              modified in the installed distribution.

       ============================

       Message:
              "The hp ProLiant Advanced Server Management module is not
              available"
              "for this Linux kernel:  ${THIS_KERNEL}"

       Description:
              This  is  an  indication that the package has either been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.

       ============================

       Message:
              "The hp ProLiant Advanced Server Management module failed
              to load!"
              "Linux Kernel Symbol Conflict  -  Attempting  rebuild  to
              resolve."

       Description:
              This  is  an  indication that the package has either been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in <standard input>:356: warning [p 7, 9.3i]: cannot adjust line
<standard input>:370: warning [p 8, 2.0i]: cannot adjust line
this
              document.  If the correct Linux kernel source is present,
              the boot scripts will attempt  to  automatically  rebuild
              the  cpqhealth  module and reload the drivers.  A message
              will be displayed at the end of the  rebuild  process  if
              the modules is successfully loaded.

       ============================

       Message:
              "WARNING!   Not  able  to  rebuild the cpqasm.o module on
              this kernel!"
              "          See   /opt/compaq/cpqhealth/cpqhealth_boot.log
              for details."

       Description:
              This  is  an  indication that the package has either been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.     The    errors    located    in    /opt/com-
              paq/cpqhealth/cpqhealth_boot.log will need to be reviewed
              and  corrected.   This may require "wrapper" file changes
              if the Linux kernel header files  have  been  drastically
              modified in the installed distribution.

       ============================

       Message:
              "/lib/modules/${THIS_KERNEL}/build does not exist"
              "This  is  an indication that the sources for this kernel
              (${THIS_KERNEL}) are not loaded."
              "Please load the appropriate sources to rebuild  module".

       Description:
              The cpqhealth driver package follows a standard Linux 2.4
              kernel distribution.  The Linux kernel source files  must
              be  loaded  as  indicated  in  the  message.  The "build"
              directory is actually a symbolic link and must exist.

       Action:
              Make sure  the  correct  distribution  package  has  been
              installed.   If  there is not a cpqhealth package for the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.  If the correct Linux kernel source is present,
              the  boot  scripts  will attempt to automatically rebuild
              the cpqhealth module and reload the drivers.

       ============================

       Message:
              "/lib/modules/${THIS_KERNEL}/build/include/linux/ver-
              sion.h does not exist"
              "Please  load the appropriate sources to rebuild module".

       Description:
              This message usually only occurs on SuSe Linux  distribu-
              tion because the file specified does not exist.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.  If the correct Linux kernel source is present,
              the boot scripts will attempt  to  automatically  rebuild
              the  cpqhealth  module  and reload the drivers.  For SuSe
              distributions, there should be a file "/boot/vmlinuz.ver-
              sion.h"  which  needs to be moved to the directory listed
              in the message.

       ============================

       Message:
              "/lib/modules/${THIS_KERNEL}/build/include/linux/auto-
              conf.h does not exist"
              "Please  load the appropriate sources to rebuild module".

       Description:
              This message usually only occurs on SuSe Linux  distribu-
              tion because the file specified does not exist.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.  If the correct Linux kernel source is present,
              the boot scripts will attempt  to  automatically  rebuild
              the  cpqhealth  module  and reload the drivers.  For SuSe
              distributions,     there     should     be     a     file
              "/boot/vmlinuz.autoconf.h" which needs to be moved to the
              directory listed in the message.

       ============================

       Message:
              "There does not appear to be kernel sources  which  match
              the current booting Linux kernel.  There must be a direc-
              tory named "/lib/modules/${THIS_KERNEL}" and  there  must
              be  a valid directory linked to "/lib/modules/${THIS_KER-
              NEL}/build"."
              "Please load the appropriate  Linux  sources  to  rebuild
              module".

       Description:
              This  is  an  indication that the package has either been
              installed on a patched Linux kernel (i.e. an errata Linux
              kernel) or the wrong binary package has been installed.

       Action:
              Make  sure  the  correct  distribution  package  has been
              installed.  If there is not a cpqhealth package  for  the
              installed Linux distribution, make sure the kernel source
              files have been installed as previously described in this
              document.  If the correct Linux kernel source is present,
              the boot scripts will attempt  to  automatically  rebuild
              the   cpqhealth  module  and  reload  the  drivers.   The
              cpqhealth RPM installation has failed and the RPM  should
              be immediately removed.

       ============================

       Message:
              "cpqasm:   You  should also stop the driver otherwise the
              Automatic"
              "         Server Recovery (ASR) Feature  may  reboot  the
              server."
              "         usage:  rmmod cpqasm"

       Description:
              When  the  driver is stopped using the /etc/init.d/cpqasm
              script, the ASR timer will continue to run.   The  cpqasm
              driver  needs  to  also  be terminated to keep the server
              from automatically  shutting  down  when  the  ASR  timer
              expires.

       Action:
              Use  the  "rmmod  cpqasm" command to unload the driver or
              restart the daemon  using  the  /etc/init.d/cpqasm  start
              command.

       ============================

       Message:
              "hp  ProLiant  Advanced Server Management driver will not
              be loaded."

       Description:
              The casm driver cannot be initialized at this time due to
              a  conflict  in  ROM internal tables or the server is not
              supported.  This driver  is  only  supported  on  servers
              which  have  the ProLiant Advanced Server Management ASIC
              (PCI indentifieer 0x0e11a0f0 or the  ProLiant  Integrated
              Lights  Out  Management ASIC (PCI indetifier 0x0e11b203).
              No other ProLiant servers are supported.

       Action:
              Check to  see  that  the  appropriate  ProLiant  Advanced
              Server  Management  ASIC is present.  This can be done by
              using the following commands:

              cat /proc/bus/pci/devices | grep -i 0e11a0f0

              cat /proc/bus/pci/devices | grep -i 0e11b203

              One of these commands must succeed  and  return  informa-
              tion.  You might also check to see if a later ROM version
              is available for this server.


Driver Messages For Configuration Or Compatibility Issues
       Most of the following  messages  will  be  seen  prepended  with
       "casm:  "  to  indicate that they are from the casm driver. This
       section deals with driver initialization issues as the driver is
       loaded.

       Message:
              "Detected %d Physical/Logical processors installed"
              "but only %d recognized by operating system!"

       Description:
              The  casm driver is able to take an inventory of the pro-
              cessors physically present to compare against the  number
              of processors the Linux operating system detects (or rec-
              ognizes).  If there are more  processors  available  than
              what  is recognized, this message is displayed.  The casm
              driver will continue to operate normally.

       Action:
              There are multiple reasons for this message to generated.
              The most common reason is a single processor Linux kernel
              is installed on a multiprocessor  server   Other  reasons
              include   the  APIC  table  setting.   On  multiprocessor
              servers the APIC setting should be "Full Table -  Mapped"
              or "Full Table".  This setting can be checked via the ROM
              Based Setup Utility (RBSU) available during POST when the
              server  is  booted  (usually  the  "F9" key prompt) or by
              reviewing the "/proc/casmdbug" file.   Please  note  that
              the  "/proc/casmdbug"  file  is  primarily  designed  for
              developer debug and is subject to change.   The  features
              in  this  file  are not very useful without full hardware
              system  specifications  for  the  ProLiant  server.   The
              "top(1)" utility can be used to review the number of pro-
              cessors   the   operating   system    recognizes.     The
              "/proc/cpuinfo"  file  can  also be used to determine how
              many processors the Linux operating system recognizes (or
              enabled).  A  review  of the operating system "boot" mes-
              sages (such as boot.log and  is  usually  logged  in  the
              "/var/log" directory) may also provide some insight as to
              why the Linux operating system fails to recognize all the
              processors present in the server.

       ============================

       Message:
              "There is no SHAFT Record in this ROM!"

       Description:
              This  is  an  indication  of  a  ROM  problem or that the
              cpqhealth  driver  has  been  loaded  on  an  unsupported
              server.

       Action:
              Remove the cpqhealth package.

       ============================

       Message:
              "Health Environment Parsing failed!"

       Description:
              This is an indication of a ROM internal table problem.

       Action:
              Please report this to Customer Service for follow up.

       ============================

       Message:
              "SHAFT Parsing failed!"

       Description:
              This is an indication of a ROM internal table problem.

       Action:
              Please report this to Customer Service for follow up.

       ============================

       Message:
              "SHAFT  and  Patch signature strings do not match at byte
              #%d"

       Description:
              Two tables internal to the ROM do not have matching  sig-
              nature  strings. Not necessarily an indication of a prob-
              lem.

       Action:
              If the  casm  driver  is  loaded  no  further  action  is
              required.   Optionally,  one  could upgrade to the latest
              ROM version for this server, if available.

       ============================

       Message:
              "Neither SMBIOS or SIT is present!"

       Description:
              This is an indication of a ROM internal table problem.

       Action:
              Please report this to Customer Service for follow up.

       ============================

       Message:
              "Unknown casmc_crom_ioctl Cmd: 0x%x"

       Description:
              This message is displayed when an application such as the
              Insight  Manager 7 Agent makes a request of the cpqhealth
              driver which the cpqhealth driver does not understand.

       Action:
              The message usually indicates slightly reduced  function-
              ality  for  the application making the request.  Check to
              see that the application and  the  cpqhealth  driver  are
              both at the latest release.

       ============================

       Message:
              "Unknown casmc_ecc_ioctl Cmd: 0x%x"

       Description:
              This message is displayed when an application such as the
              Insight Manager 7 Agent makes a request of the  cpqhealth
              driver which the cpqhealth driver does not understand.

       Action:
              The  message usually indicates slightly reduced function-
              ality for the application making the request.   Check  to
              see  that  the  application  and the cpqhealth driver are
              both at the latest release.

       ============================

       Message:
              "Unknown casmc_asr_ioctl Cmd: 0x%x"

       Description:
              This message is displayed when an application such as the
              Insight  Manager 7 Agent makes a request of the cpqhealth
              driver which the cpqhealth driver does not understand.

       Action:
              The message usually indicates slightly reduced  function-
              ality  for  the application making the request.  Check to
              see that the application and  the  cpqhealth  driver  are
              both at the latest release.

       ============================

       Message:
              "Unknown casmc_event_ioctl Cmd: 0x%x"

       Description:
              This message is displayed when an application such as the
              Insight Manager 7 Agent makes a request of the  cpqhealth
              driver which the cpqhealth driver does not understand.

       Action:
              The  message usually indicates slightly reduced function-
              ality for the application making the request.   Check  to
              see  that  the  application  and the cpqhealth driver are
              both at the latest release.


Driver Messages For General Environmental Issues
       Most of the following  messages  will  be  seen  prepended  with
       "casm:  "  to  indicate that they are from the casm driver. This
       section deals with driver general environment monitoring events.
       Specific Environment Issues follow this section.

       ============================

       Message:
              "Monitoring of fan #%d has been disabled."

       Description:
              Monitoring of the indicated fan has been disabled because
              the interrupt threshold was exceeded. This is an  indica-
              tion  that  the  fan  or the fan controller is generating
              spurious interrupts.

       Action:
              The fan specified in the message may need to be replaced.

       ============================

       Message:
              "Power  supply  %d  revision  is  %d.%d,  %d.%d is recom-
              mended."

       Description:  The power supply is of a version  other  than  the
       recommended version.

       Action:
              Contact Hewlett-Packard  ProLiant  support  to  determine
              what needs to be done.

       ============================

       Message:
              "Monitoring of Health has been disabled."

       Description:
              System Health is no longer being monitored.

       Action:
              Could  be due to a hardware failure.  Usually caused by a
              device interrupting the driver at a very fast rate.

       ============================

       Message:
              "Temperature sensor #%d has been disabled."

       Description:
              The indicated temp sensor has been disabled.

       Action:
              Call Hewlett-Packard ProLiant Support for further  assis-
              tance.

       ============================

       Message:
              "Monitoring of VRM #%d has been disabled."

       Description:
              Monitoring of the indicated VRM has been disabled because
              the interrupt threshold was exceeded. This is an  indica-
              tion that the VRM is generating spurious interrupts.

       Action:
              The indicated VRM may need to be replaced.

       ============================

       Message:
              "Monitoring of power supply #%d has been disabled."

       Description:
              Monitoring  of  the  indicated power supply has been dis-
              abled because the interrupt threshold was exceeded.  This
              is an indication that the power supply is generating spu-
              rious interrupts.

       Action:
              The indicated power supply may need to be replaced.

       ============================

       Message:
              "Spurious interrupt:  Feature %d has been previously dis-
              abled!"

       Description:
               This  is  an  indication that a feature (identified by a
              number) had been disabled.  There may have been one  more
              event in the queue to be processed.

       Action:
              This is usually the result of some other event (such as a
              fan failure).  Once the  previous  event  has  been  cor-
              rected, no other action will be required.

       ============================

       Message:
              "Feature %d has been disabled"

       Description:
              This  is  usually  because  a  feature  (or  device)  has
              exceeded it's interrupt threshold limit.

       Action:
              A previous message will have  been  displayed  indicating
              that a device has exceeded it's set threshold limit.  The
              failing device should be replaced.

       ============================

       Message:
              "The system is NOT configured  to shutdown on  non-criti-
              cal  thermal failures - (configurable via RBSU Utility)."

       Description:
              This message accompanies the previous message if the sys-
              tem will not shut down for non-critical thermal failures.

       Action:
              If desired, configure the system via RBSU to shutdown  on
              non-critical  failures.  The RBSU Utility is usually exe-
              cuted by pressing the "F9" function key during POST  when
              indicated.


Driver Messages For Temperature Violations:
       Most  of  the  following  messages  will  be seen prepended with
       "casm: " to indicate that they are from the  casm  driver.  This
       section  deals  with detected temperature violations.  Note that
       there may be multiple messages each  giving  slightly  different
       details (such as location) but all having similar causes.

       Events  which  are  corrected will use the same message with the
       phrase "has been repaired" appended to the end of  the  message.
       This simplifies matching failures with corrections in the system
       message logs.


       ============================

       Message:
              "Approaching Dangerous Temperature. The %s  Thermal  Sen-
              sor(#%d) is reporting overheating conditions."

       Description:
              A  thermal sensor is reporting high temperatures. Thermal
              shutdown may be triggered if  the  temperature  increases
              beyond the threshold.

       Action:
              The  ambient temperature in the environment must be below
              35C.  If this condition is met, there  may  be  something
              blocking  the air flow to the server.  If the Termal Sen-
              sor indicates that this is a CPU, this may be an  indica-
              tion  of  an improperly mounted CPU Heat Sink.  Check the
              front  of the server for a blockage.  A failed fan  could
              also lead to this condition in a warm environment.


       ============================

       Message:
              "System  Overheating  (Zone  %s, Location %s, Temperature
              %s)"

              "External Chassis Overheating (Chassis %s, Zone %s, Loca-
              tion %s, Temperature%s)"

              "Internal Storage System Overheating (%sSlot %s, Zone %s,
              Location %s, Temperature %s)"

              "Server Blade Enclosure Overheating  (Zone  %s,  Location
              %s, Temperature %s, %s)"

              "Power  Enclosure Overheating (Zone %s, Location %s, Tem-
              perature %s, %s)"

       Description:
              This message  indicates that the  indicated  location  in
              the  system is overheating.  Another message will be dis-
              played if a system shutdown will occur.

       Action:
              On some servers the fans will increase to full  speed  in
              an  attempt  to  cool the server.  If the server does not
              cool down within 60 seconds, the  operating  system  will
              most likely be shutdown to close the file systems.

              Check  for  blocked  air  flow to the indicated location.
              Check air conditioning system in environment.


Driver Messages For Fan Related Events:
       Most of the following  messages  will  be  seen  prepended  with
       "casm:  "  to  indicate that they are from the casm driver. This
       section deals with detected fan related events.  Note that there
       may  be multiple messages each giving slightly different details
       (such as location) but all having similar causes.

       Events which are corrected will use the same  message  with  the
       phrase  "has  been repaired" appended to the end of the message.
       This simplifies matching failures with corrections in the system
       message logs.


       ============================

       Message:
              "Fan Failure (Fan %s, Location %s)"

              "External  Chassis Fan Failure (Chassis %s, Fan %s, Loca-
              tion %s)"

              "External Storage System Fan Failure (%sSlot %s, Fan  %s,
              Location %s)"

              "Internal  Storage System Fan Failure (%sSlot %s, Fan %s,
              Location %s)"

       Description:
              This message indicates that a fan in the specified  loca-
              tion has failed.   Another message will be displayed if a
              system shutdown will occur.

       Action:
              On some servers such as the ProLiant Dense Line  (DL),  a
              fan failure will trigger a shutdown even if Thermal Shut-
              down has been disabled in RBSU.  There  is  a  60  second
              grace period to allow hot plug fans to be replaced in the
              case of a redundant fan failure.   Another  message  will
              be  displayed  if a system shutdown will occur.  The RBSU
              setup utility can be used to override "Thermal  Shutdown"
              in the event of a bad signal from the fan.  Any fan which
              shows a failure should be replaced as  soon  as  possible
              even if the fan continues to operate.


       ============================

       Message:
              "System Fan Inserted (Fan %s, Location %s)"

              "External Chassis Fan Inserted (Chassis %s, Fan %s, Loca-
              tion %s)"

              "External Storage System Fan Inserted (%sSlot %s, Fan %s,
              Location %s)"

       Description:
              This  message  indicates  that the indicated fan has been
              inserted.

       Action:
              This is just an information message. No action  required.


       ============================

       Message:
              "System Fan Removed (Fan %s, Location %s)"

              "External  Chassis Fan Removed (Chassis %s, Fan %s, Loca-
              tion %s)"

              "External Storage System Fan Removed (%sSlot %s, Fan  %s,
              Location %s)"

       Description:
              This  message  indicates  that the indicated fan has been
              removed.

       Action:
              This is just an information message. No action  required.


       ============================

       Message:
              "System Fans Not Redundant (Location %s)"

       Description:
              This  message   indicates  that  the  fans  are no longer
              redundant.  This message usually follows  a  Fan  Failure
              message.

       Action:
              Correct  the  previous  fan error (failure or removal) to
              restore redundancy.


Driver Messages For Power Supply Related Events:
       Most of the following  messages  will  be  seen  prepended  with
       "casm:  "  to  indicate that they are from the casm driver. This
       section deals with detected power supply related  events.   Note
       that there may be multiple messages each giving slightly differ-
       ent details (such as location) but all having similar causes.

       Events which are corrected will use the same  message  with  the
       phrase  "has  been repaired" appended to the end of the message.
       This simplifies matching failures with corrections in the system
       message logs.


       ============================

       Message:
              "System Power Supply: %s (Power Supply %s)"

              "External  Chassis  Power  Supply:  %s (Chassis %s, Power
              Supply %s)"

              "External Storage System Power  Supply:  %s  (%sSlot  %s,
              Power Supply %s)"

       Description:
              This  message  indicates  that the specified power supply
              has failed or electric power to the supply has been  dis-
              continued.

       Action:
              Check to see that the power source (i.e. the plug) to the
              power supply is still providing electricity.  If power is
              available,  the power supply may have failed and needs to
              be replaced.


       ============================

       Message:
              "System Power Supply Removed (Power Supply %s)"

              "External Chassis Power Supply Removed (Chassis %s, Power
              Supply %s)"

              "External Storage System Power Supply Removed (%sSlot %s,
              Power Supply %s)"

       Description:
              This message  indicates that the specified  power  supply
              has been removed from the system.

       Action:
              No  action  required  as this is just an information mes-
              sage.


       ============================

       Message:
              "System Power Supply Inserted (Power Supply %s)"

              "External Chassis  Power  Supply  Inserted  (Chassis  %s,
              Power Supply %s)"

              "External  Storage  System  Power Supply Inserted (%sSlot
              %s, Power Supply %s)"

       Description:
              This message  indicates that the specified  power  supply
              has been inserted into the system.

       Action:
              No  action  required  as this is just an information mes-
              sage.


       ============================

       Message:
              "System Power Supplies Not Redundant"

              "External Chassis Power Supplies Not  Redundant  (Chassis
              %s)"

              "External  Storage  System  Power  Supplies Not Redundant
              (%sSlot %s)"


       Description:
              This message  indicates that the indicated power supplies
              are  no longer redundant.  This message usually follows a
              Power Supply Failure message.

       Action:
              Correct the  previous  power  supply  error  (failure  or
              removal) to restore redundancy.


Driver Messages For Memory Subsystem Related Events:
       Most  of  the  following  messages  will  be seen prepended with
       "casm: " to indicate that they are from the  casm  driver.  This
       section  deals  with  detected  Memory Subsystem related events.
       Note that there may be multiple messages  each  giving  slightly
       different  details  (such  as  location)  but all having similar
       causes.

       Events which are corrected will use the same  message  with  the
       phrase  "has  been repaired" appended to the end of the message.
       This simplifies matching failures with corrections in the system
       message logs.


       ============================

       Message:
              "Corrected Memory Error threshold exceeded (Slot %s, Mem-
              ory Module %s)"

              "Corrected Memory Error threshold exceeded  (System  Mem-
              ory)"

              "Corrected Memory Error threshold exceeded (Slot %s, Bank
              %s)"

              "Corrected Memory Error threshold exceeded  (System  Mem-
              ory, Bank %s)"

       Description:
              This  message indicates that a memory module has exceeded
              the prefailure threshold for correctable memory errors.

       Action:
              The memory module should be replaced as soon as possible.


       ============================

       Message:
              "Uncorrectable Memory Error (Slot %s, Memory Module %s)"

              "Uncorrectable Memory Error (System Memory)"

       Description:
              This  message  indicates that a memory module has failed.
              This problem could be intermittent due to the way  memory
              fails so the sytem may reboot even though a memory module
              indicated a failure.

       Action:
              The memory module should be replaced as soon as possible.
              There  is  the possibility that the server may get a Non-
              Maskable Interrupt and be halted if this error occurs.


       ============================

       Message:
              "Memory Cartridge Removed (Slot %s)"

              "Memory Board Removed (Slot %s)"

       Description:
              This message indicates that a memory  cartridge  /  board
              has been removed.

       Action:
               No  action  required as this is just an information mes-
              sage.


       ============================

       Message:
              "Memory Cartridge Inserted (Slot %s)"

              "Memory Board Inserted (Slot %s)"

       Description:
              This message indicates that a memory  cartridge  /  board
              has been inserted.

       Action:
              No  action  required  as this is just an information mes-
              sage.


       ============================

       Message:
              "Memory Cartridge Unlocked (Slot %s)"

       Description:
              This message indicates that a memory  cartridge  /  board
              has been manually unlocked.

       Action:
               No  action  required as this is just an information mes-
              sage.


       ============================

       Message:
              "Memory Cartridge locked (Slot %s)"

       Description:
              This message indicates that a memory  cartridge  /  board
              has been manually locked.

       Action:
               No  action  required as this is just an information mes-
              sage.


       ============================

       Message:
              "Memory Cartridge Bus Fault (Slot %s)"

              "Memory Cartridge Power Fault (Slot %s)"

       Description:
              This message indicates that a memory  cartridge  /  board
              has a fault.

       Action:
               Contact hp ProLiant support for further assistance.


       ============================

       Message:
              "Memory  Cartridge  Configuration  Error (Slot %s, Memory
              Module %s)" "Memory Board Configuration Error  (Slot  %s,
              Memory Module %s)"

       Description:
              This  message  indicates  that a memory cartridge / board
              has a configuration error. The usual cause of this  error
              is using DIMMS which do not match in size and speed.

       Action:
              Use  identical DIMMs in all memory cartridge / board con-
              figurations if using multiple memory cartridges / boards.


       ============================

       Message:
              "Online  Spare Memory Engaged for Faulty Module (Slot %s,
              Memory Module %s)"

              "Online Spare Memory Engaged for Faulty Module (Slot  %s,
              Bank %s)"

              "Online  Spare  Memory  Engaged for Faulty Module (System
              Memory, Memory Module %s)"

              "Online Spare Memory Engaged for  Faulty  Module  (System
              Memory, Bank %s)"

       Description:
              This  message indicates that the server was configured to
              use the the "Online Spare Memory" option of the  ProLiant
              Advanced Memory Protection feature and was forced to fail
              over due to a memory module exceeding the prefailure cor-
              rectable  error  threshold limit.  There should be a pre-
              ceding message indicating which  memory  module  exceeded
              the  prefailure  correctable error threshold limit in the
              ProLiant Integrated Management Log.

       Action:
              Shutdown the server and replace the DIMM indicated in the
              message.


       ============================

       Message:
              "Mirrored Memory Engaged for Faulty Module (Slot %s, Mem-
              ory Module %s)"

              "Mirrored Memory Engaged for Faulty Module (Slot %s, Bank
              %s)"

              "Mirrored  Memory  Engaged for Faulty Module (System Mem-
              ory, Memory Module %s)"

              "Mirrored Memory Engaged for Faulty Module  (System  Mem-
              ory, Bank %s)"

       Description:
              This  message indicates that the server was configured to
              use the the "Mirrored  Memory"  option  of  the  ProLiant
              Advanced Memory Protection feature and was forced to fail
              over due to a failed memory module.

       Action:
               Most ProLiant servers which have this feature allow  the
              failed  memory  board  to  be "hot plugged" out of a live
              system.  The boards have indicator lights to let the user
              know  which board should be removed to replace the failed
              DIMM.  When the DIMM has been repaired, the  system  will
              automatically return to the redundant (mirrored) state.


       ============================

       Message:
              "Memory Subsystem Not Mirrored"

       Description:
              This  message  indicates  that the memory mirror has been
              broken.

       Action:
              No action required as this is just  an  information  mes-
              sage.


Driver Messages For Automatic Server Recovery (ASR) Events
       The  following  messages  are displayed when an Automatic Server
       Recovery (ASR) timeout has occurred.  The order of the  messages
       is very important.  When the ProLiant Advanced Server Management
       driver detects an ASR timeout, the driver will attempt to grace-
       fully  shutdown  the operating system.  If the graceful shutdown
       attempt is successful, a message will  be  log  indicating  this
       otherwise the server will hard reboot as if the power switch was
       momentarily pressed.

       Most of the following  messages  will  be  seen  prepended  with
       "casm: " to indicate that they are from the casm driver.


       ============================

       Message:
              "NMI  - Automatic Server Recovery timer expiration - Hour
              %d - %d/%d/%d"

       Description:
              This message indicates that the ProLiant Advanced  Server
              Management driver detected an ASR timeout and is attempt-
              ing to gracefully shutdown the operating system.  If this
              message  is  not  present, this may be an indication of a
              critical hardware failure (such as a non-correctable  ECC
              error on a memory DIMM) or some other severe event.  This
              is the first of a series of  messages  displayed  to  the
              console.  This message will NOT be logged to the ProLiant
              Integrated Management Log and will most likely not be  in
              any system logs.

       Action:
              Review  all  the  messages log to the ProLiant Integrated
              Management Log to see if any previous  errors  have  been
              logged.  This does take a bit of detective work to figure
              these types of errors out.


       ============================

       Message:
              "ASR Lockup Detected: %s"

       Description:
              This message indicates that the ProLiant Advanced  Server
              Management driver detected an ASR timeout and is attempt-
              ing to gracefully shutdown the operating system.  If this
              message  is  not  present, this may be an indication of a
              critical hardware failure (such as a non-correctable  ECC
              error on a memory DIMM) or some other severe event.  This
              will be the first message logged to  the  ProLiant  Inte-
              grated Management Log (if logging is possible).

       Action:
              Review  all  the  messages log to the ProLiant Integrated
              Management Log to see if any previous  errors  have  been
              logged.  This does take a bit of detective work to figure
              these types of errors out.


       ============================

       Message:
              "casm: ASR performed a successful OS shutdown"

       Description:
              This message indicates that the ProLiant Advanced  Server
              Management driver detected an ASR timeout and was able to
              successfully  perform a graceful shutdown of the  operat-
              ing  system.  If this message is not present, this may be
              an indication of a hardware failure (such as  a  non-cor-
              rectable  ECC  error  on  a memory DIMM), a high priority
              process consuming all the available CPU cycles  (software
              failure)  or  possible a device such as a storage or net-
              work controller  flooding  the  system  with  interrupts.
              This  will  be  the second message logged to the ProLiant
              Integrated Management Log if logging is possible.

              If this message is  present,  this  usually  indicates  a
              software  type error such as a high priority process con-
              suming all the available CPU cycles.  Tools such  as  SAR
              can  be  used  in  conjunction  with  the ASR facility to
              locate the errant process at the time of failure.

       Action:
              Review all the messages log to  the  ProLiant  Integrated
              Management  Log  to  see if any previous errors have been
              logged.  This does take a bit of detective work to figure
              these types of errors out.


       ============================

       Message:
              "ASR Detected by System ROM"

       Description:
              This  message  indicates  that  the  ProLiant  Server ROM
              detected an ASR timeout.  This message is  almost  always
              present in the ProLiant Integrated Management Log when an
              ASR timeout occurs.  If this is the  ONLY  "ASR"  message
              logged  to  the  ProLiant Integrated Management Log, this
              may be indicative of a hardware failure (such as  a  non-
              correctable ECC error on a memory DIMM).  The ASR feature
              on a ProLiant server will hard reset the server when  the
              timeout expires with no software intervention required.

       Action:
              Review  all  the  messages log to the ProLiant Integrated
              Management Log to see if any previous  errors  have  been
              logged.  This does take a bit of detective work to figure
              these types of errors out.


       ============================

       Message:
              "Automatic Operating System Shutdown Initiated Due to Fan
              Failure"

              "Automatic  Operating  System  Shutdown  Initiated Due to
              Overheat Condition"

              "Automatic Operating System Shutdown Initiated Due to VRM
              Failure"

              "Automatic  Operating System Shutdown Initiated by a Soft
              Power Down"

              "Automatic Operating System Shutdown Initiated by a soft-
              ware"

              "Server  Blade Enclosure Blade Shutdown Via Power Manage-
              ment Software (Slot %s)"

       Description:
              This message indicates that a graceful  operating  system
              shutdown  will take place unless the failing condition is
              immediately corrected.  For most events, there is  a  one
              minute  delay  period  to  allow  the opportunity for the
              failing condition to be corrected.  For example, the user
              may  need to remove two fans (as part of a Field Replaca-
              ble Unit) to correct a failed fan.  This gives  the  user
              one  minute to put the working pair of fans back into the
              system (assuming  there  was  a  redundant  fan  solution
              available for the ProLiant server).

       Action:
              If  replacing  a failed fan (which is permitted to be hot
              replaced), there is a one minute grace period  to  insert
              the working fan into the system.


       ============================

       Message:
              "Automatic Operating System Shutdown Aborted"

              "Automatic  Operating  System Shutdown Due to Fan Failure
              Aborted"

              "Automatic Operating  System  Shutdown  Due  to  Overheat
              Aborted"

       Description:
              This  message indicates that the scheduled graceful shut-
              down of the operating system was aborted.  Execution will
              continue.

       Action:
              Information message.  No action required.


Driver Messages For Critical Hardware Events (NMI)
       Most  of  the  following  messages  will  be seen prepended with
       "casm: " to indicate that they are from the  casm  driver.  This
       section deals with Non-Maskable Interrupt (NMI) errors which are
       common.  There are other NMI type errors which  may  occur.   In
       general, all NMI type errors are usually related to hardware and
       customer support will need to be engaged to provide a  solution.
       The  list  below  covers  the  more  common  errors which may be
       displayed.

       ============================

       Message:

       "(MCA) Processor BINIT in progress!

       Description:
              An Intel Processor Machine Check Architecture  event  has
              occurred.

       Action:
              The  server  will  be  forced  down  hard.  The processor
              should be replaced.

       ============================

       Message:
              "casm:  NMI Handler has been called on processor %d!"

       Description:
              This is a message which is logged for all NMI's.   If  no
              other  messages  are  logged or displayed, this may be an
              indication of an Uncorrectable Memory Error.  These types
              of  errors  are  difficult to log because the casm device
              driver code may  actually  be  physically  located  on  a
              failed  DIMM.  This  will be the first message with other
              details following  if  the  source  of  the  NMI  can  be
              detected.   The  ProLiant Automatic Server Recovery (ASR)
              feature uses the  NMI  facility  to  alert  the  ProLiant
              Advanced  Server  Management driver that the ASR timer is
              about to expire.


       Action:
              If no other messages are displayed, try moving the  DIMMs
              around  to  different  slots  and  see  if the error will
              recreate.  Otherwise, check for subsequent messages which
              will give an indication of the source of the problem.

       ============================

       Message:
              "casm: Spinning for 2 seconds!"

       Description:
              All  NMI's  are processed by the bootstrap processor.  If
              an NMI is received on a processor other  than  the  boot-
              strap  processor,  the casm driver will spin to allow the
              NMI be processed.

       Action:
              This message along with other NMI messages can be used to
              assist in sourcing the problem that generated the NMI.

       ============================

       Message:
              "NMI  - Uncorrectable memory error - "Hour %d - %d/%d/%d"
              "Bank %d DIMMs"

       Description:
              The Bank indicated DIMMS have generated an  Uncorrectable
              memory error.

       Action:
              The failed DIMMS need to be replaced.

       ============================

       Message:
              "NMI  -  Uncorrectable memory error - "Hour %d - %d/%d/%d
              Slot:  %d   Module  %d"

       Description:
              The specific DIMM indicated in the message has  generated
              an Uncorrectable Memory Error.

       Action:
              The failed DIMM need to be replaced.

       ============================

       Message:
              "NMI  - Automatic Server Recovery timer expiration - Hour
              %d - %d/%d/%d"

       Description:
              The Advanced Server Management (ASM) watchdog  timer  has
              expired.   This  is  an indication that either a software
              application consumed all of the Processor resources  such
              that  the  operating system was not able to schedule or a
              major event occurred (such as  a  Non-Maskable  Interrupt
              (NMI))  and  halted  the  operating system.  See previous
              section concerning ProLiant Advanced Server Recovery.

       Action:
              Use the messages in the Integrated Management  Log  (IML)
              and  the  operating  system  event logs to determine what
              caused the operating system to cease  functioning  or  to
              "lock up".

       ============================

       Message:
              "NMI  -  Unexpected Slot Power Loss (Bus %d, dev %d, func
              %d) Hour %d - %d/%d/%d"

       Description:
              This is a result of opening a PCI Hot Plug slot while the
              slot is powered on.

       Action:
              If  no  PCI  Hot  Plug  slot was opened, this could be an
              indication of a slot failure.  Check the slot  LED's  for
              proper operation.

       ============================

       Message:
              "NMI  -  PCI  Bus  parity error (Bus %d, dev %d, func %d)
              Hour %d - %d/%d/%d"

       Description:
              A PCI device has indicated a parity error has occurred.

       Action:
              This is an indication that the PCI device  specified  may
              be failing.  If no other errors have occurred before this
              error, this might be an indication that the specified PCI
              device  is failed or about to fail.  If other errors have
              occurred, this error needs to be analyzed in context with
              previous errors.

       ============================

       Message:
              "NMI  -  Dump  Switch  has  been  pressed  -  "Hour  %d -
              %d/%d/%d"

       Description:
              Some ProLiant servers has a  "debug"  switch  which  will
              generate  a  Non-Maskable  Interrupt (NMI).  This message
              indicates that this switch was pressed.

       Action:
              None.

       ============================

       Message:
              "Unrecoverable Non-Maskable Interrupt (NMI) error"

       Description:
              This is a NMI which the ProLiant server ROM was not  able
              to  "source".  This is either a problem with the ROM code
              or a hardware failure of a product not shipped as part of
              the server (i.e. a third party hardware device).

       Action:
              Contact customer support for assistance.

       ============================

       Message:
              "Unknown  Non-Maskable  Interrupt (NMI) error (0x%x) Hour
              %d - %d/%d/%d"

       Description:
              This message indicates that an unknown NMI was generated.
              The  hexidecimal  value returned is an internal code from
              the Server ROM which customer support can interpret.

       Action:
              Contact customer support for assistance.


BUGS
       Limited Hardware Platforms
              This driver will only work on ProLiant servers which have
              the  ProLiant  Advanced Server Management (ASM) ASIC (PCI
              ID 0x0E11A0F0) or the ProLiant iLO Advanced  Server  Man-
              agement (PCI ID 0x0E11B203) ASICs.

       Initialization time
              After inserting, the driver needs about one minute to get
              fully  "situated".  Specifically,  faulty  hardware  that
              reports  back to normal might not be recognized as "work-
              ing" within the first minute of operation.

FILES
       /opt/compaq/cpqhealth
              default directory for the scripts  and  binaries.   There
              are sub-directories for the cpqasm and cpqevt drivers and
              then further sub-directories  for  each  supported  Linux
              kernel.

       /opt/compaq/cpqhealth/custom_cpqhealth.sh
              The  shell  script  which  will rebuild and repackage the
              cpqhealth driver.

       /opt/compaq/cpqhealth/cpqhealth_boot.log
              A log file containing the results of the last boot of the
              system.   The RPM errors are also logged here.  This file
              and      the      previous      version       ("/opt/com-
              paq/cpqhealth/cpqhealth_boot.log.old")  should  always be
              sent with any queries on the health  driver  installation
              or removal.


       /etc/init.d/cpqasm
              This  file  is linked to the multiuser initstate directo-
              ries and controls the loading of the  cpqasm  and  cpqevt
              drivers.   This  script  makes  the  determination if the
              drivers need to be rebuilt.


SEE ALSO
       cpqimlview (8)

       www.compaq.com/support/files/server/us/

       www.compaq.com/products/software/linux/index.html

AUTHOR
       Hewlett-Packard Company  <http://www.hp.com>.


Copyright Notice
       copyright 2002 Compaq Information Technologies Group, L.P.


                               30 September 2002                  cpqhealth(4)