****************
  Introduction
****************

This document describes the Nagios plugins mainly used to monitor NorduGrid
ARC compute elements and related resources, but some probes should also be
usable to test non-ARC resources.  The package includes commands to do

* LDAP queries and tests on the information system, including GLUE 2.0 and
  legacy schemas.

* Job submission and monitoring of jobs with additional custom checks.

* Transfers to and from storage elements using various protocols.

The following chapters will cover the probes related to each of these topics.
This chapter will describe common configuration and options.

**Acknowledgements.**
This work is co-funded by the EC EMI project under the FP7 Collaborative
Projects Grant Agreement Nr. INFSO-RI-261611.


.. _configuration-files:

Configuration Files
===================

The configuration is merged from a list of the INI-format files, where
settings from later files take precedence.  By default files matching
``/etc/nagios/*.ini`` are read in lexicographical order, but this can be
overridden by setting ``$ARCNAGIOS_CONFIG`` to a colon-separated list of the
files to load.  A naming scheme like the following is suggested::

    /etc/arc/nagios/20-dist.ini         - comes with the default package
    /etc/arc/nagios/60-egi.ini          - comes with the EGI package
    /etc/arc/nagios/90-local.ini        - suggested for local changes

An alternative to ``/etc/arc/nagios`` can be specified in the environment
variable ``$ARCNAGIOS_CONFIG_DIR``.

Under the same prefix, a default job script template is installed::

    /etc/arc/nagios/20-dist.d/default.xrsl.j2

You can provide a modified script by placing it e.g. in
``/etc/arc/nagios/90-dist.d/default.xrsl.j2``, but be careful with this in
production environment since later versions of the probes may require changes
to the script which makes the modified version incompatible.

Each probe has a main configuration section named after the probe or
colloquially ``[arcce]`` for the ``check_arcce_*`` probes.  In this section
you can provide defaults for string-valued command-line options.  The name of
the configuration variable corresponding to an option is obtained by stripping
the initial "``--``" and replacing "``-``" with "``_``", e.g.
"``--home-dir``" becomes "``home_dir``".


Common Options
==============

The following options are common to most of the probes:

``--home-dir=<dir>``
    Override $HOME at startup. This is a workaround for external commands
    which store things under $HOME on systems where the user account running
    Nagios does not have an appropriate or writable home directory.

``--loglevel=(debug|info|warning|error)``
    This option allows you to increase the verbosity of the Nagios probes.
    Additional messages will occur as extended status lines in Nagios.

``--multiline-separator=<chars>``
    Replacement for newlines when submitting multi-line results to passive
    services. Pass the empty string drop extra lines. This option exists
    because Nagios currently don't support multi-line passive results.

``--command-file=<path>``
    The path of the Nagios command file.  By default $NAGIOS_COMMANDFILE is
    used, which is usually the right thing.

``--how-invoked=(nagios|manual)``, ``--dump-options``
    These are only needed for debugging purposes.

``--arcnagios-spooldir``
    Top level directory for storing state information and for use as a working
    area.  The default is ``/var/spool/arc/nagios``.  If you need to debug an
    issue related to CE jobs, look under the ``ce-*`` subdirectories.


.. _x509-proxy:

Proxy Certificate
=================

The ``check_arcce_*`` and ``check_gridstorage`` probes will require a proxy
certificate to succeed.  The probes will maintain a proxy when provided a X509
certificate and key.  You can place these in a common section:

.. code-block:: ini

    [gridproxy]
    default_voms = <voms>
    user_key = <path>
    user_cert = <path>
    #user_proxy = <path> # Optionally override the path of the generated proxy.

The probes which require an X509 proxy have a ``--voms=<voms>`` option to
specify the VOMS server to contact instead of ``default_voms``.  When a
``user_key`` and ``user_cert`` pair is given, the default ``user_proxy`` path
is unique to the selected VOMS.

To use a pre-initialized proxy, make sure ``user_key`` and ``user_cert`` are
not set.  You will probably want to use a non-default location for the
proxy.  Either point to it with the environment variable ``X509_USER_PROXY``
or set it in the configuration file:

.. code-block:: ini

    [gridproxy]
    user_proxy = <path>

If you use several VOs with require different certificates, you can replace
the above section with one section ``gridproxy.<voms>`` per ``<voms>`` and use
the ``--voms`` option to select which section to use.  These sections don't
have the ``default_voms`` setting.


Running Probes from the Command-Line
====================================

The following instructions apply to ``check_arcce_submit``,
``check_arcce_monitor``, ``check_arcce_clean``, and ``check_gridstorage``.
The other probes can be invoked from the command-line without special
attention.

For testing and debugging, it can be convenient to invoke the probes manually
as a regular user.  This can be done as follows.  Choose a directory where you
can store run-time state.  Below, we use ``/tmp``, but it may be tidier to
create a fresh directory.  Then, create a configuration like

.. code-block:: ini

    [DEFAULT]
    arcnagios_spooldir = /tmp/arc-nagios-testing

    [gridproxy]
    default_voms = <your-vo>

    [gridproxy.your-vo]
    user_proxy = /tmp/x509up_u<your-user-id>

substituting suitable values for the ``<your-*>`` meta-variables.  You may
need to add additional settings depending on want you test, of course.
Acquire a proxy certificate (if needed) and pointing to the set of
configurations you need, including the above:

.. code-block:: sh

    arcproxy -S <your-vo>
    export ARCNAGIOS_CONFIG=/etc/arc/nagios/20-dist.ini:<your-config>

The probes can now be run as

.. code-block:: sh

    check_arcce_submit --how-invoked=manual ...
    check_arcce_monitor --how-invoked=manual ...
    check_arcce_clean --how-invoked=manual ...

The main purpose of the ``--how-invoked=manual`` is to tell the probe that any
passives results shall be printed to the screen rather than submitted to the
Nagios command pipe.  It is not strictly needed for active-only probes.
