Windows Stat Collection with Telegraf, InfluxDB and Grafana

A small screenshot of the Telegraf & Influx Windows Host Overview Dashboard.
Uncategorized

Pre-requisites

  1. InfluxDB

    • Not in scope of this guide.
  2. Grafana

    • Not in scope of this guide.
  3. Creating a new database for Telegraf stats.

    Replace INFLUXDB_HOST with the hostname or IP address of your InfluxDB server.

    curl -POST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE telegraf"
    
  4. Add an InfluxDb data source

    • Click the settings Gear Icon and choose the Data Sources option

    • Click the Add Data Source button

    • Find the InfluxDb data source and choose Select

    • Set the HTTP Url setting to the InfluxDb container’s IP address and port

      http://172.17.0.2:8086

    • Set the database name to the database name you created in the previous step (telegraf is the default.)

    • Click the Save and Test button to verify Grafana can connect to InfluxDb.

Telegraf for Windows installation

  1. Download Telegraf for windows and extract it to your drive/

  2. Extract and update telegraf.conf

    • Create the install directory for telegraf, this guide will use c:\telegraf-1.17.0\telegraf.conf.

      mkdir "c:\telegraf-1.17.0"
      cd "c:\telegraf-1.17.0"
      telegraf.exe config > telegraf.conf
      
    • The open telegraf.conf in a text editor and update as you will. At bare minimum, update:

      agent section’s interval key to 10 seconds.

      [agent]
      ## Default data collection interval for all inputs
      interval = "10s"
      
    • The InfluxDB Output section’s urls and database keys with the url to the InfluxDB server and the database name if you do not want to use the default which is telegraf.

      [[outputs.influxdb]]
      urls = ["http://192.168.2.221:8086"]
      database = "telegraf"
      
    • Finally, locate the [[inputs.win_perf_counters]] section and replace it completely with the following.

      See Collector Configuration Details from https://grafana.com/grafana/dashboards/1902 for more details.

      [[inputs.win_perf_counters]]
      [[inputs.win_perf_counters.object]]
      # Processor usage, alternative to native, reports on a per core.
      ObjectName = "Processor"
      Instances = ["*"]
      Counters = [
          "% Idle Time",
          "% Interrupt Time",
          "% Privileged Time",
          "% User Time",
          "% Processor Time"
      ]
      Measurement = "win_cpu"
      # Set to true to include _Total instance when querying for all (*).
      #IncludeTotal=false
      
      [[inputs.win_perf_counters.object]]
      # Disk times and queues
      ObjectName = "LogicalDisk"
      Instances = ["*"]
      Counters = [
          "% Idle Time",
          "% Disk Time",
          "% Disk Read Time",
          "% Disk Write Time",
          "% User Time",
          "% Free Space",
          "Current Disk Queue Length",
          "Free Megabytes",
          "Disk Read Bytes/sec",
          "Disk Write Bytes/sec"
      ]
      Measurement = "win_disk"
      # Set to true to include _Total instance when querying for all (*).
      #IncludeTotal=false
      
      [[inputs.win_perf_counters.object]]
      ObjectName = "System"
      Counters = [
          "Context Switches/sec",
          "System Calls/sec",
          "Processor Queue Length",
          "Threads",
          "System Up Time",
          "Processes"
      ]
      Instances = ["------"]
      Measurement = "win_system"
      # Set to true to include _Total instance when querying for all (*).
      #IncludeTotal=false
      
      [[inputs.win_perf_counters.object]]
      # Example query where the Instance portion must be removed to get data back,
      # such as from the Memory object.
      ObjectName = "Memory"
      Counters = [
          "Available Bytes",
          "Cache Faults/sec",
          "Demand Zero Faults/sec",
          "Page Faults/sec",
          "Pages/sec",
          "Transition Faults/sec",
          "Pool Nonpaged Bytes",
          "Pool Paged Bytes"
      ]
      # Use 6 x - to remove the Instance bit from the query.
      Instances = ["------"]
      Measurement = "win_mem"
      # Set to true to include _Total instance when querying for all (*).
      #IncludeTotal=false
      
      [[inputs.win_perf_counters.object]]
      # more counters for the Network Interface Object can be found at
      # https://msdn.microsoft.com/en-us/library/ms803962.aspx
      ObjectName = "Network Interface"
      Counters = [
          "Bytes Received/sec",
          "Bytes Sent/sec",
          "Packets Received/sec",
          "Packets Sent/sec"
      ]
      Instances = ["*"] # Use 6 x - to remove the Instance bit from the query.
      Measurement = "win_net"
      #IncludeTotal=false #Set to true to include _Total instance when querying for all (*).
      
      [[inputs.win_perf_counters.object]]
      # Process metrics
      ObjectName = "Process"
      Counters = [
          "% Processor Time",
          "Handle Count",
          "Private Bytes",
          "Thread Count",
          "Virtual Bytes",
          "Working Set"
      ]
      Instances = ["*"]
      Measurement = "win_proc"
      #IncludeTotal=false #Set to true to include _Total instance when querying for all (*).
      
  3. Install Telegraf as a service

    • Start cmd.exe as an administrator.

    • execute the following command to have telegraf install itself as a service:

      telegraf.exe --service install --config "c:\telegraf-1.17.0\telegraf.conf"
      
  4. Import the Telegraf & Influx Windows Host Overview dashboard