Home  »  IT360 Best Practices Guide

IT360 Best Practices Guide

Tags:  

IT360 BEST PRACTICES GUIDE

This document will give the ManageEngine™ IT360`s best practices recommended for its optimum functioning. The best practices are categorized under the following sections:
  1. System Requirements
  2. Applications and Servers
  3. Networks
  4. Console
I System Requirements          

The software and hardware requirements for the various editions of IT360 is available in the below link.
http://www.manageengine.com/it360/help/meitms/setup-meit360/system_requirements.html

II Applications and Servers
  1. Increase data collection threads : Applications & Server module use a separate thread for every data collection of monitor, the default number of threads used is controlled by the respective schedulers specified in the <IT360_Home>/applications/working/conf/threads.conf . The default number of threads used for URL Sequence Monitoring is 5 and other monitors are 12.

    When the number of monitors is greater than 100, you can increase the data collection threads (default is 10) by editing the 'Data Collection' attribute in the file

    <IT360_Home>/applications/working/conf/threads.conf

    The 'Data Collection' thread value can be increased to a value ranging between 20 and 25.
    Similarly, when the number of URL Sequence monitors is large in the range of 100, you can increase the URL monitoring thread count by editing the 'URL Monitor' attribute to 20 – 25. However, the total number of threads including both URL monitor and other data collection threads must not exceed 40.

    Note: Increase in the 'Data Collection' thread will lead to slight increase in CPU usage.

  2. Increase number of connections in database connection pool : When the number of monitors is greater than 100, connection pool can be increased by editing the NON_TRANS_CONNECTIONS field in <IT360_Home>/applications/working/conf/database_params.conf

    The default value of 6 can be set to 90 percent of the number of data collection threads.
    Note: Increase in the 'Connection Pool’ will lead to slight increase in CPU usage.

  3. Downtime Scheduler : When you do not need monitoring to happen during specific time period for some monitors, you can achieve this using the option ‘Downtime Scheduler’ available under ‘Admin’ tab. Monitors configured for ‘Downtime Scheduler’ does not use the data collection thread during the specified period and allows other monitors to utilize the data collection threads and hence improves performance. 
  4. Poll Intervals : When performance polling is set only the availability and health check will happen at every polling event while other performance data will be collected only at the end of scheduled number of polls. This will reduce the load on the system when you want to monitor only health and availability. 

    This option is available under ‘Admin’ -> ‘Performance Polling’ where you can set the number of polls before collecting performance data for server monitors. We recommend having a Poll Interval of 10 minutes for every monitor and a performance poll value of 3.

  5. Increase JVM heap size : Memory tuning can be achieved by editing the following parameters in "wrapper.conf" file available under the directory <IT360_Home>/applications/working/conf
    • wrapper.java.initmemory – Initial Java Heap Size
    • wrapper.java.maxmemory – Maximum Java Heap size (Optimum recommended value is 1024)
  6. Network Availability check : When the IT360 Server is out of the network or is not connected to the network, the status of all the Monitors that are currently been monitored will be shown as 'Down'. You can avoid this by enabling the 'Check for Network Availability' option. When this option is enabled, IT360 will generate alerts for the unavailability of resources only if the specified host is reachable in the network. For example, let us assume that the system/host which runs the IT360 has been isolated from the network. Enable this option and specify a hostname in the network (preferably not the hostname where IT360 runs). Now, IT360 tries to ping that machine for its availability in the network. If not available, alerts are not generated and resources are not shown as down. 

    You can also specify the IP of your routers, gateways, etc., to check the system/host which runs the IT360 is present in the network.

  7. URL Availability check : When the IT360 is out of the network or if external proxy settings are not configured, the status of all the URLs that are currently been monitored will be shown as 'Down'. You can avoid this by enabling the 'Check URL Availability' option. 

    When this option is enabled, IT360 will generate alerts for the unavailability of URL only if the other specified URL is down. For example, let us assume that the system/host which runs the IT360 has been isolated from the network. Enable this option and specify another URL which is expected to be up always. Now, IT360 tries to monitor URL for its availability. If not available, alerts are not generated and URL is not shown as down. Further a mail is sent to the configured mail address intimating the same.

  8. Consecutive polls check : You can use this option to determine the number of consecutive polls the error exists before reporting the error to system. Consecutive polls count in 'Admin' -> Action / Alert Settings can be increased from default value of 1 to 2. So that alerts will be generated after two consecutive polls which would eradicate false alarms.
  9. It is recommended to use SNMP or WMI mode for monitoring Windows machine and SSH or Telnet for monitoring UNIX based machines. SSH should be of more use as it provides much more security over Telnet.
  10. Alerting on Monitor Polling problem :
    1. Connect to the IT360 Probe webconsole. Go to the Admin --> Servers & Applications --> Action / Alarm Settings link.
    2. In that page, select/check the "Monitor Error Mail" attribute and set a proper value for "consecutive polls before sending error" .
    3. We would suggest to set the value for "consecutive polls before sending error" >= 5. This is to avoid flooding of e-mail alerts.
    4. The e-mail alert notification will be send to e-mail address of the 'admin' user.
  11. To overcome Delay Polling issue :
    1. Need to observe the current CPU usage.
      • If there is no high CPU usage/spike :
        • Increase the number of Datacollection threads to 25.
          • Thread count to be updated in <IT360_HOME>\applications\working\conf\threads.conf  file.
          • Threads specific to particular monitor type can be increased on need basis. (ex:-URLMonitors, KeyValue_Monitors)
        • Increase the Non Transaction connections to 20.
          • Connection to be changed in <IT360_HOME>\applications\working\conf\database_params.conf file against NON_TRANS_CONNECTIONS.
          • It can be increased relative to threads. If we increase the threads value more than 25 then we have to increase the NON_TRANS_CONNECTIONS relative to that. (Generally it is 90 percent of the threads.conf)
      • If there is high CPU usage/spike, check for number of CPU Cores, and it should be minimum of 8 Core Processor. If it is less than 8 Core, then need to increase the CPU Cores.
  12. Event Log Rules:
    1. If more number of Event Log Rules to be created/defined, create Event Log Rules based on the source instead of Event Id.  After creating Event Log Rule associate actions profiles to that attribute. When you associate the action profiles to the event log rule attribute, you will get the details of the event log message in the email notification.
  13. To avoid false alarm while checking availability.
    • Open the file IT360\applications\conf\AvailabilityTests.conf. 
      • Change the value "am.ping.retries" as 1. If the device is down it will not raise an alarm immediately, it will retry another time before raising an alarm. So totally am.ping.retries+1 times retry will happen.
  14. Target Server Restart detection:-
      • To find server restart between two datacollection cycle, associate Error<HostName> attribute to health. This will generate an alarm, if the target server is restarted between two datacollection cycle.
        If there is any need to set PollInterval as 1 to check the server's availability the above configuration will resolve this. Since the above Error attribute will give target server restart time during next datacollection cycle.
  15. Send Mail when Data Collection(DC) stops
    1. Stop IT360 services.
    2. Add this attribute am.senddcstoppedmail.enabled=true in AMServer.properties file which is available under <IT360-Home>/applications/conf directory.
    3. Start IT360 services.
  16. To Avoid False Positive Alerts 

To accomplish this, you need to first determine the number of consecutive polls. If you do not want IT360 to generate alert, when a threshold condition is crossed for the first time in the Applications module, then you can use this option to specify the number of consecutive polls before generating an alert or before reporting an error to the system. The Consecutive polls check can be done in the below ways:

A) Globally:

 The Consecutive polls check set using the below steps is applicable for all servers/applications in IT360.  

  • Set the Consecutive polls count under 'Admin -> Action / Alert Settings'. This can be increased from default value of 1 to 2, so that the alerts will be generated after two consecutive polls, thus eradicating false alarms.
    • Poll 2 times consecutively before reporting a service is down or an attribute is critical 
    • Poll 2 times consecutively before reporting an attribute is warning 
    • Poll 2 times consecutively before reporting a service is up or an attribute is clear
  • This is applicable for Availability and attributes such as CPU Utilization, Memory Utilization etc., which decide the Health status of a resource. 

B) Based on resourcetype - for the Attribute "Availablilty" only:

The Consecutive polls check set using the below steps is 
applicable for a specific resourcetype and Availability attribute only.

  • Navigate to 'Admin --> Configure Alarms --> Select Monitor Type --> Select the Monitor (for which threshold profile to be updated)'.
  • Attributes gets listed under 'Configure Alarms for Attributes' section. 
  • Click the Associate link against the Availablilty attribute.
  • In the window that pops up, enable the Show Advanced Options checkbox. The Set consecutive polls count section shows up. Do the following:
  • If the alarm to be raised only after two consecutive ping failure then, set the value as 2 in the below box:
        Poll  times consecutively before reporting the monitor is down 
  • If the alarm to be raised only after two consecutive ping success then, set the value as 2 in the below box:
       Poll  times consecutively before reporting the monitor is up

  • C) Based on resourcetype - for
     attributes such as CPU Utilization, Memory Utilization etc:

    The Consecutive polls check set using the below steps is applicable for a specific resourcetype and attributes such as CPU Utilization, Memory Utilization, etc. For this, first create a threshold profile and then apply it to the desired attribute.

    1. Creating new threshold profile:

      To override attribute level configurations, i.e., if the alarm has to be generated only after 3 consecutive threshold violations, then it is necessary to create threshold profile for that particular attribute level. 

    • Create the new threshold profile under 'Admin --> Threshold & Anomalies --> Add New'.
    • Update 'Polls to try' value as 3.
      Example: To reduce Alarms due to a temporary surge in CPU Utilization, set 'Polls to try' as 3 or > 3. If you set it as 3, then the alarm will be generated only if the CPU Utilization has crossed the threshold during 2 consecutive polls.

         Now, the new Threshold profile is created. 

    2. Applying the created threshold profile to attribute

    • Navigate to 'Admin --> Configure Alarms --> Select Monitor Type --> Select the Monitor (for which threshold profile to be updated)'.
    • Attributes gets listed under 'Configure Alarms for Attributes' section. 
    • Click the Associate link against the attribute whose threshold profile is to be updated. Example: If the threshold profile is to be updated for CPU Utilization attribute, click the Associate against CPU Utilization.
    • In the window that pops up, choose the newly created Threshold profile from the Associate Threshold drop down. Click Save.

    III Networks
    1. Before Discovery : Network module relies on other communication protocols SNMP, WMI, Telnet, and SSH for classification and monitoring. So make sure the following two configurations are done before triggering discovery,
      • Configuring the relevant SNMP, WMI, and CLI credentials
      • Defining Device Templates
    2. Configuring Discovery Parameters : IT360 Network module pings the devices for discovery and further for determining availability, and 4 ping packets are sent by default. If there is network latency, it is possible that some devices are not discovered, or post discovery, they are not polled for status. This can be addressed by configuring few ping parameters.
      Steps to achieve this.
      1. From <IT360-Home>/networks/conf folder open the file ping.properties.
      2. Un-comment (remove the # symbol) against the timeout parameter and specify the ping timeout depending on the latency.
      3. Similarly, you can increase the number of ping retries by configuring the value for retries parameter. Make sure you un-comment this parameter too for the configuration to be effected.
      4. Save the changes to the file.
      5. IT360 service requires a restart when changes are made to this file. So,restart IT360 for the changes to be effected.

      Note: The above configuration is recommended only if there is latency.

    3.  Addressing SNMP Timeout Issue : The default SNMP query timeout to variables in a device is 5 seconds. If there is a delay in the agent response for some devices, you can globally increase the SNMP timeout as follows:
      1. From <IT360-Home>/networks/conf folder, open the file NmsProcessesBE.conf
      2. Look for the following default entry in this file:

        PROCESS com.adventnet.nms.poll.Collector

         ARGS POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 120000             PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999

      3. Include the additional parameter DATA_COLLECTION_SNMP_TIMEOUT 15. Now the changed entry will be as shown below:

        PROCESS com.adventnet.nms.poll.Collector

        ARGS POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 12000 PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999 DATA_COLLECTION_SNMP_TIMEOUT 15
      4. Save the changes and restart IT360 Service.
    4. SNMP Data-collection : By default, IT360 uses 12 threads for SNMP polling. The assumption is that each monitored device has a minuimum of 10 polled data (monitored resources such as cpu, memory, incoming traffic, out-going traffic, errors etc). Each Interface object has 11 polleddata which include RxTraffic, TxTraffic,Bandwidth Utilization, Errors, Discards etc. Depending on the number of polleddata, you can increase the number of datapoll threads. Steps to achieve this is given below
      • From <IT360-Home>/networks/conf folder, open the file threads.conf
      • Increase the value of datapoll threads from 12 to the required number of threads for SNMP polling.
      • Save changes and restart IT360 Service.
      • Following is a reference table to increase the number of threads:
      • Number of devices/interfaces

        Number of datapoll Threads

        Number of SNMP Polled Data

        Monitoring Interval

        Upto 500 device/ 5000 interfaces

        12 (default)

        Upto 50000

        15 mins

        Beyond the above numbers

        13 - 20

        More than 50000: Additional 1 thread for every 5000 polleddata.

        15 min

    5. Database Connection Pool : If the number of PolledData is over 50000, the number of non-transaction connections can be increased in the range of 7 to 10 (default being 6 connections). Here is how you configure,
      • From <IT360-Home>/networks/conf folder, open the file database_params.conf.
      • Increase the value of NON_TRANS_CONNECTIONS parameter to the required number.
      • Save changes and restart IT360 Service.
    6. Disabling Unnecessary Polling during scheduled maintenance : Whenever a maintenance is scheduled in the network for some devices, you can suspend polling for those devices by scheduling downtime in IT360 Admin --> Networks --> Downtime Scheduler. This prevents unnecessary requests to network resources resulting in false alerts. There will be improved performance as the devices covered in the scheduled do not use the data poll threads.
    7. Disabling polling for a category  : From Admin --> Networks --> Monitoring Intervals, remove selection for the category for which you want to disable polling.
    8. Specifying Polling Intervals for Devices : From Admin --> Networks --> Monitoring Intervals, configure a smaller monitoring interval for critical categories like servers or routers and space out for the other categories like printers etc. The recommended interval for very critical devices is 5 minutes, while you can set a minimum of 1 minute interval also for a very few devices.
    9. Device Dependencies : False alerts are triggered when a set of monitored devices are behind another device (a firewall, router etc). The requests sent to the devices are routed through the firewall or router, and in the event of these dependent devices being down, all devices behind this dependent devices are deemed as down. Configuring device dependencies will prevent
      unnecessary polling to the devices behind the dependent device.
    10. Increase JVM heap size : Memory tuning can be achieved by editing the following parameters in "wrapper.conf" file available under the directory <IT360_Home>/networks/conf
      • wrapper.java.initmemory – Initial Java Heap Size
      • wrapper.java.maxmemory – Maximum Java Heap size (Below 2.5K interfaces - 2048, 2.5K to 5K interfaces - 4096, 5K to 8K interfaces - 4096)
    11. Alerting on Datacollection Problem :
      1. Go to the <IT360 Home> networksconf directory and take a backup of the NmsProcessesBE.conf file available in that directory in some other directory say C:
      2. Check for the "PROCESS     com.adventnet.nms.poll.Collector" in that file.
      3. The next line starts with "ARGS" and add the attribute "GENERATE_DATACOLL_EVENT true" at the end of that line. Once added that line will look like as given below.
        1. ARGS  POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 120000 PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999 GENERATE_DATACOLL_EVENT true
      4. Save this file and restart the IT360 Probe service for the above change to take effect.
      5. Once re-started, as when there is a problem in datacollection occurs, a alert will be generated in the Networks module. The alert can be viewed under the 'Alarms' tab.
      6. One word of caution. If there is lot of SNMP timeout or SNMP related problems during the SNMP Datacollection, then there is a possibility that lot of alerts generated in the IT360 Probe service.

    IV Console
    1. User Synch Interval from ActiveDirectory : The Synchornization interval for Import users from Active Directory should be set not less than one day if the number of users is high. The setting can be found under Admin --> General --> Active Directory --> Import users from Active Directory.
    2. Increase JVM heap size : Memory tuning can be achieved by editing the following parameters in "wrapper.conf" file available under the directory <IT360_Home>/applications/working/conf
      • wrapper.java.initmemory – Initial Java Heap Size
      • wrapper.java.maxmemory – Maximum Java Heap size (Optimum recommended value is 1024)

    Post a comment

    Your Name or E-mail ID (mandatory)

    Note: Your comment will be published after approval of the owner.





     RSS of this page