A friend of mine and myself run some Grafana instances to monitor various things vi Telegraf. John was having some storage space issues on his machine and noticed that in our default installations, we had never set the data retention policies beyond default which keeps them forever!
Below is John Mora’s write-up of addressing the issue, Thank you sir! I have used the process as documented and unsurprisingly it fixed my problem as well.
Problem
Influxdb used up all allocated drive space
Current Installation
Grafana, unifipoller and influxdb in docker containers on Hyper-V
Solution
View retention policies and saved shards within influxdb. Remove unwanted shards reclaiming drive space and update retention policy to remove shards at a predetermined time base. (Instead of creating a new Retention Policy, I simply Updated/Altered the default auto-generated policy created at time of influxdb installation)
Questions
-
What is influxdb Retention Policies?
From: https://oznetnerd.com/2017/06/11/influxdb-retention-policies-shard-groups/
The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (
duration
), how many copies of those data are stored in the cluster (replication factor
), and the time range covered byshard groups
(shard group duration
).Retention Policies
(RPs) are unique per database and along with the measurement and tag set define a series.When you create a database, InfluxDB automatically creates a
Retention Policy
calledautogen
with an infinite duration, a replication factor set to one, and a shard group duration set to seven days. See Database Management for retention policy management.Summary
RPs define for how long data is kept. The default autogen RP is set to infinite while the default shard group duration (which is part of the autogen RP), is set to seven days.
It was at this point I found myself getting confused. What is the difference between a
RP duration
and aShard Group duration
? And how can you have an expiry date on data which is configured to be kept infinitely? More on this later. -
What is an influxdb Shard?
From: https://oznetnerd.com/2017/06/11/influxdb-retention-policies-shard-groups/
Shard
A
shard
contains the actual encoded and compressed data, and is represented by a TSM file on disk. Everyshard
belongs to one and only oneshard group
. Multipleshards
may exist in a single shard group. Eachshard
contains a specific set of series. All points falling on a given series in a givenshard group
will be stored in the sameshard
(TSM file) on disk.Shard Groups
Shard groups
are logical containers forshard
s.Shard groups
are organized by time and retention policy. Everyretention policy
that contains data has at least one associatedshard group
. A givenshard group
contains allshard
s with data for the interval covered by theshard group
The interval spanned by eachshard group
is theshard duration
.Shard Duration
The
shard duration
determines how much time eachshard group
spans. The specific interval is determined by theSHARD DURATION
of theretention policy
. SeeRetention Policy
management for more information.For example, given a
retention policy
withSHARD DURATION
set to1w
, eachshard group
will span a single week and contain all points with timestamps in that week. -
What within influxdb is using up all my drive space?
Answer: The Shards (see above).
Solution Steps
-
Open Console session
-
Connect to Docker Influxdb Container Console Session:
docker exec -it influxdb bash
-
Connect to influxdb
# influx Connected to http://localhost:8086 version 1.8.3 InfluxDB shell wersion: 1.9.3
-
Query influxdb for Installed Databses
> show database; name: databases ---- _internal unifipoller
-
Set Database to Query
> use unifipoller; Using database unifipoller
-
Show Database Retention Policy(ies)
> show retention policies; name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 true
-
Show Database Shards
> show shard groups name: shard groups id database retention_policy start_time end_time expiry_time -- -------- ---------------- ---------- -------- ----------- 18 _internal monitor 2021-04-23T00:00:00Z 2021-04-24T00:00:00Z 2021-05-01T00:00:00Z 16 unifipoller autogen 2021-03-19T00:00:00Z 2021-04-19T00:00:00Z 2021-05-10T00:00:00Z 19 unifipoller autogen 2021-04-19T00:00:00Z 2021-04-26T00:00:00Z 2021-05-10T00:00:00Z 12 telegraf autogen 2021-03-19T00:00:00Z 2021-04-19T00:00:00Z 2021-05-10T00:00:00Z 17 telegraf autogen 2021-04-19T00:00:00Z 2021-04-26T00:00:00Z 2021-05-10T00:00:00Z
-
Update Database Retention Policy
> ALTER RETENTION POLICY autogen ON unifipoller DURATION 2w REPLICATION 1 SHARD DURATION 1w > INSERT INTO autogen measure1 value=0
-
Verify Updated Database Retention Policy
> show retention policies; name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 336h0m0s 168h0m0s 1 true
-
Delete Database Shards (reclaim Drive Space)
- (Repeat for each Shard id). You can determine which ones are no longer in use by the
end_time
, if it is in the past and you don’t want it you can remove it.
> drop shard 12
Repeat for each Shard Id
- (Repeat for each Shard id). You can determine which ones are no longer in use by the
-
Verify Database Shard has been Deleted
Remaining Shards after deletion of all desired Shards should appear as follows (for this example.)
> show shard groups name: shard groups id database retention_policy start_time end_time expiry_time -- -------- ---------------- ---------- -------- ----------- 18 _internal monitor 2021-04-23T00:00:00Z 2021-04-24T00:00:00Z 2021-05-01T00:00:00Z 19 unifipoller autogen 2021-04-19T00:00:00Z 2021-04-26T00:00:00Z 2021-05-10T00:00:00Z 17 telegraf autogen 2021-04-19T00:00:00Z 2021-04-26T00:00:00Z 2021-05-10T00:00:00Z
Summary
After performing the above steps, I reclaimed over 10GB of drive space. Updating/Altering the autogen Retention Policy will allow influxdb to delete any Shards older than two weeks, preserving drive space.