Ceph is an open source software-defined storage solution designed to address the block, file and object storage needs of modern enterprises. Its highly scalable architecture sees it being adopted as the new norm for high-growth block storage, object stores, and data lakes. Ceph provides reliable and scalable storage while keeping CAPEX and OPEX costs in line with underlying commodity hardware prices.
What is Ceph?
Ceph is a distributed storage system that is massively scalable and high-performing with no single point of failure. Ceph is a Software Distributed System (SDS), meaning it can be run on any hardware that matches its requirements.
Ceph consists of multiple components:
- Ceph Monitors (MON) are responsible for forming cluster quorums. All the cluster nodes report to monitor nodes and share information about every change in their state.
- Ceph Object Store Devices (OSD) are responsible for storing objects on local file systems and providing access to them over the network. Usually, one OSD daemon is tied to one physical disk in your cluster. Ceph clients interact with OSDs directly.
- Ceph Manager (MGR) provides additional monitoring and interfaces to external monitoring and management systems.
- Reliable Autonomic Distributed Object Stores (RADOS) are at the core of Ceph storage clusters. This layer makes sure that stored data always remains consistent and performs data replication, failure detection, and recovery among others.
To read/write data from/to a Ceph cluster, a client will first contact Ceph MONs to obtain the most recent copy of their cluster map. The cluster map contains the cluster topology as well as the data storage locations. Ceph clients use the cluster map to figure out which OSD to interact with and initiate a connection with the associated OSD.
How Ceph works
If you’ve just started working with Ceph, you already know there’s a lot going on under the hood. To help you in your journey to becoming a Ceph master, here’s a list of 10 commands every Ceph cluster administrator should know. Print it out, stick it to your wall and let it feed your Ceph mojo!
1. Check or watch cluster health: ceph status || ceph -w
If you want to quickly verify that your cluster is operating normally, use ceph status to get a birds-eye view of cluster status (hint: typically, you want your cluster to be active + clean). You can also watch cluster activity in real-time with ceph -w; you’ll typically use this when you add or remove OSDs and want to see the placement groups adjust.
2. Check cluster usage stats: ceph df
To check a cluster’s data usage and data distribution among pools, use ceph df. This provides information on available and used storage space, plus a list of pools and how much storage each pool consumes. Use this often to check that your cluster is not running out of space.
3. Check placement group stats: ceph pg dump
When you need statistics for the placement groups in your cluster, use ceph pg dump. You can get the data in JSON as well in case you want to use it for automatic report generation.
4. View the CRUSH map: ceph osd tree
Need to troubleshoot a cluster by identifying the physical data center, room, row and rack of a failed OSD faster? Use ceph osd tree, which produces an ASCII art CRUSH tree map with a host, its OSDs, whether they are up and their weight.
5. Create or remove OSDs: ceph osd create || ceph osd rm
Use ceph osd create to add a new OSD to the cluster. If no UUID is given, it will be set automatically when the OSD starts up. When you need to remove an OSD from the CRUSH map, use ceph osd rm with the UUID.
6. Create or delete a storage pool: ceph osd pool create || ceph osd pool delete
Create a new storage pool with a name and number of placement groups with ceph osd pool create. Remove it (and wave bye-bye to all the data in it) with ceph osd pool delete.
7. Repair an OSD: ceph osd repair
Ceph is a self-repairing cluster. Tell Ceph to attempt repair of an OSD by calling ceph osd repair with the OSD identifier.
8. Benchmark an OSD: ceph tell osd.* bench
Added an awesome new storage device to your cluster? Use ceph tell to see how well it performs by running a simple throughput benchmark. By default, the test writes 1 GB in total in 4-MB increments.
9. Adjust an OSD’s crush weight: ceph osd crush reweight
Ideally, you want all your OSDs to be the same in terms of thoroughput and capacity…but this isn’t always possible. When your OSDs differ in their key attributes, use ceph osd crush reweight to modify their weights in the CRUSH map so that the cluster is properly balanced and OSDs of different types receive an appropriately-adjusted number of I/O requests and data.
10. List cluster keys: ceph auth list
Ceph uses keyrings to store one or more Ceph authentication keys and capability specifications. The ceph auth list command provides an easy way to keep track of keys and capabilities
If you follow best practices for deployment and maintenance, Ceph becomes a much easier beast to tame and operate. Here’s a look at some of the most fundamental and useful Ceph commands we use on a day to day basis to manage our own internal Ceph clusters
1. status
First and foremost is ceph -s
, or ceph status
, which is typically the first command you’ll want to run on any Ceph cluster. The output consolidates many other command outputs into one single pane of glass that provides an instant view into cluster health, size, usage, activity, and any immediate issues that may be occuring.
HEALTH_OK
is the one to look for; it’s an immediate sign that you can sleep at night, as opposed to HEALTH_WARN
or HEALTH_ERR
, which could indicate drive or node failure or worse.
Other key things to look for are how many OSDs you have in vs out, how many other services you have running, such as rgw or cephfs, and how they’re doing.
$ ceph -s
cluster:
id: 7c9d43ce-c945-449a-8a66-5f1407c7e47f
health: HEALTH_OK
services:
mon: 1 daemons, quorum danny-mon (age 2h)
mgr: danny-mon(active, since 2h)
osd: 36 osds: 36 up (since 2h), 36 in (since 2h)
rgw: 1 daemon active (danny-mgr)
task status:
data:
pools: 6 pools, 2208 pgs
objects: 187 objects, 1.2 KiB
usage: 2.3 TiB used, 327 TiB / 330 TiB avail
pgs: 2208 active+clean
2. osd tree
Next up is ceph osd tree
, which provides a list of every OSD and also the class, weight, status, which node it’s in, and any reweight or priority. In the case of an OSD failure this is the first place you’ll want to look, as if you need to look at OSD logs or local node failure, this will send you in the right direction. OSDs are typically weighted against each other based on size, so a 1TB OSD will have twice the weight of a 500GB OSD, in order to ensure that the cluster is filling up the OSDs at an equal rate.
If there’s an issue with a particular OSD in your tree, or you are running a very large cluster and want to quickly check a single OSD’s details without grep-ing or scrolling through a wall of text first, you can also use osd find
. This command will enable you to identify an OSD’s IP address, rack location and more with a single command.
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 329.69476 root default
-3 109.89825 host danny-1
0 hdd 9.15819 osd.0 up 1.00000 1.00000
1 hdd 9.15819 osd.1 up 1.00000 1.00000
2 hdd 9.15819 osd.2 up 1.00000 1.00000
3 hdd 9.15819 osd.3 up 1.00000 1.00000
4 hdd 9.15819 osd.4 up 1.00000 1.00000
5 hdd 9.15819 osd.5 up 1.00000 1.00000
6 hdd 9.15819 osd.6 up 1.00000 1.00000
-7 109.89825 host danny-2
12 hdd 9.15819 osd.12 up 1.00000 1.00000
13 hdd 9.15819 osd.13 up 1.00000 1.00000
14 hdd 9.15819 osd.14 up 1.00000 1.00000
15 hdd 9.15819 osd.15 up 1.00000 1.00000
16 hdd 9.15819 osd.16 up 1.00000 1.00000
17 hdd 9.15819 osd.17 up 1.00000 1.00000
-5 109.89825 host danny-3
24 hdd 9.15819 osd.24 up 1.00000 1.00000
25 hdd 9.15819 osd.25 up 1.00000 1.00000
26 hdd 9.15819 osd.26 up 1.00000 1.00000
27 hdd 9.15819 osd.27 up 1.00000 1.00000
28 hdd 9.15819 osd.28 up 1.00000 1.00000
$ ceph osd find 37
{
"osd": 37,
"ip": "172.16.4.68:6804/636",
"crush_location": {
"datacenter": "pa2.ssdr",
"host": "lxc-ceph-main-front-osd-03.ssdr",
"physical-host": "store-front-03.ssdr",
"rack": "pa2-104.ssdr",
"root": "ssdr"
}
}
3. df
Similar to the *nix df command, that tells us how much space is free on most unix and linux systems, Ceph has its own df command, ceph df
, which provides an overview and breakdown of the amount of storage we have in our cluster, how much is used vs how much is available, and how that breaks down across our pools and storage classes.
Filling a cluster to the brim is a very bad idea with Ceph – you should add more storage well before you get to the 90% mark, and ensure that you add it in a sensible way to allow for redistribution. This is particularly important if your cluster has lots of client activity on a regular basis.
$ ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 330 TiB 327 TiB 2.3 TiB 2.3 TiB 0.69
TOTAL 330 TiB 327 TiB 2.3 TiB 2.3 TiB 0.69
POOLS:
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.rgw.root 1 32 1.2 KiB 4 768 KiB 0 104 TiB
default.rgw.control 2 32 0 B 8 0 B 0 104 TiB
default.rgw.meta 3 32 0 B 0 0 B 0 104 TiB
default.rgw.log 4 32 0 B 175 0 B 0 104 TiB
default.rgw.buckets.index 5 32 0 B 0 0 B 0 104 TiB
default.rgw.buckets.data 6 2048 0 B 0 0 B 0 104 TiB
4. osd pool ls detail
This is a useful one for getting a quick view of pools, but with a lot more information about their particular configuration. Ideally we need to know if a pool is erasure coded or triple-replicated, what crush rule we have in place, what the min_size is, how many placement groups are in a pool, and what application we’re using this particular pool for.
$ ceph osd pool ls detail
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 64 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 68 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 73 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 71 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 76 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 83 lfor 0/0/81 flags hashpspool stripe_width 0 application rgw
5. osd crush rule dump
At the heart of any Ceph cluster are the CRUSH rules. CRUSH is Ceph’s placement algorithm, and the rules help us define how we want to place data across the cluster – be it drives, nodes, racks, datacentres. For example if we need to mandate that we need at least one copy of data at each one of our sites for our image store, we’d assign a CRUSH rule to our image store pool that mandated that behaviour, regardless of how many nodes we may have on each side.
crush rule dump
is a good way to quickly get a list of our crush rules and how we’ve defined them in the cluster. If we want to then make changes, we have a whole host of crush commands we can use to make modifications, or we can download and decompile the crush map to manually edit it, recompile it and push it back up to our cluster.
$ ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]
6. versions
With a distributed cluster running in production, upgrading everything at once and praying for the best is clearly not the best approach. For this reason, each cluster-wide daemon in Ceph has its own version and can be upgraded independently. This means that we can upgrade daemons on a gradual basis and bring our cluster up to date with little or no disruption to service.
As long as we keep our versions somewhat close to one another, daemons with differing versions will work alongside each other perfectly happily. This does mean that we could potentially have hundreds of different daemons and respective versions to manage during an upgrade process. Enter ceph versions
– a very easy way to get a look at how many instances of a daemon running a specific version are running.
$ ceph versions
{
"mon": {
"ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
},
"mgr": {
"ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
},
"osd": {
"ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 36
},
"mds": {},
"rgw": {
"ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 39
}
}
7. auth print-key
If we have lots of different clients using our cluster, we’ll need to get our keys off the cluster so they can authenticate. ceph auth print-key
is a pretty handy way of quickly viewing any key, rather than fishing through configuration files. Another useful and related command is ceph auth list
, which will show us a full list of all the authentication keys across the cluster for both clients and daemons, and what their respective capabilities are.
$ ceph auth print-key client.admin
AQDgrLhg3qY1ChAAzzZPHCw2tYz/o+2RkpaSIg==d
8. crash ls
Daemon crashed? There could be all sorts of reasons why this may have happened, but ceph crash ls
is the first place we want to look. We’ll get an idea of what’s crashed and where, so we’ll be able to diagnose further. Often these will be minor warnings or easy to address errors, but crashes can also indicate more serious problems. Related useful commands are ceph crash info <id>
, which gives more info on the crash ID in question, and ceph crash archive-all
, which will archive all of our crashes if they’re warnings we’re not worried about, or issues that we’ve already dealt with.
$ ceph crash ls
1 daemons have recently crashed
osd.9 crashed on host danny-1 at 2021-03-06 07:28:12.665310Z
9. osd flags
There are a number of OSD flags that are incredibly useful. For a full list, see OSDMAP_FLAGS, but the most common ones are:
pauserd, pausewr
– Read and Write requests will no longer be answered.noout
– Ceph won’t consider OSDs as out of the cluster in case the daemon fails for some reason.nobackfill, norecover, norebalance
– Recovery and rebalancing is disabled
We can see how to set these flags below with the ceph osd set
command, and also how this impacts our health messaging. Another useful and related command is the ability to take out multiple OSDs with a simple bash expansion.
$ ceph osd out {7..11}
marked out osd.7. marked out osd.8. marked out osd.9. marked out osd.10. marked out osd.11.
$ ceph osd set noout
noout is set
$ ceph osd set nobackfill
nobackfill is set
$ ceph osd set norecover
norecover is set
$ ceph osd set norebalance
norebalance is set
$ ceph osd set nodown
nodown is set
$ ceph osd set pause
pauserd,pausewr is set
$ ceph health detail
HEALTH_WARN pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
OSDMAP_FLAGS pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
10. pg dump
All data is placed into Ceph, which provides an abstraction layer – a bit like data buckets (not S3 buckets) – for our storage, and allows the cluster to easily decide how to distribute data and best react to failures. It’s often useful to get a granular look at how our placement groups are mapped across our OSDs, or the other way around. We can do both with pg dump
, and while many of the placement group commands can be very verbose and difficult to read, ceph pg dump osds
does a good job of distilling this into a single pane.
$ ceph pg dump osds
dumped osds
OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
31 70 GiB 9.1 TiB 71 GiB 9.2 TiB [0,1,2,3,4,5,6,8,9,12,13,14,15,16,17,18,19,20,21,22,23,30,32] 175 72
13 70 GiB 9.1 TiB 71 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,12,14,24,25,26,27,28,29,30,31,32,33,34,35] 185 66
25 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,24,26] 180 64
32 83 GiB 9.1 TiB 84 GiB 9.2 TiB [0,1,2,3,4,5,6,7,12,13,14,15,16,17,18,19,20,21,22,23,31,33] 181 73
23 102 GiB 9.1 TiB 103 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,22,24,25,26,27,28,29,30,31,32,33,34,35] 191 69
18 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,17,19,24,25,26,27,28,29,30,31,32,33,34,35] 188 67
11 64 GiB 9.1 TiB 65 GiB 9.2 TiB [10,12,21,28,29,31,32,33,34,35] 0 0
8 90 GiB 9.1 TiB 91 GiB 9.2 TiB [1,2,7,9,14,15,21,27,30,33] 2 0
14 70 GiB 9.1 TiB 71 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,13,15,24,25,26,27,28,29,30,31,32,33,34,35] 177 64
33 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,32,34] 187 80
3 89 GiB 9.1 TiB 90 GiB 9.2 TiB [2,4,8,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 303 74
30 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,9,12,13,14,15,16,17,18,19,20,21,22,23,29,31] 179 76
15 71 GiB 9.1 TiB 72 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,10,11,14,16,24,25,26,27,28,29,30,31,32,33,34,35] 178 72
7 70 GiB 9.1 TiB 71 GiB 9.2 TiB [6,8,15,17,30,31,32,33,34,35] 0 0
28 90 GiB 9.1 TiB 91 GiB 9.2 TiB [0,1,2,3,4,5,6,7,9,12,13,14,15,16,17,18,19,20,21,22,23,27,29] 188 73
16 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,15,17,24,25,26,27,28,29,30,31,32,33,34,35] 183 66
1 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,2,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 324 70
26 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,25,27] 186 61
22 89 GiB 9.1 TiB 90 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,11,21,23,24,25,26,27,28,29,30,31,32,33,34,35] 178 80
0 103 GiB 9.1 TiB 104 GiB 9.2 TiB [1,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 308 83
5 70 GiB 9.1 TiB 71 GiB 9.2 TiB [4,6,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 312 69
21 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,20,22,24,25,26,27,28,29,30,31,32,33,34,35] 187 63
4 96 GiB 9.1 TiB 97 GiB 9.2 TiB [3,5,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 305 77
34 96 GiB 9.1 TiB 97 GiB 9.2 TiB [0,1,2,3,4,5,6,8,9,12,13,14,15,16,17,18,19,20,21,22,23,33,35] 189 73
17 96 GiB 9.1 TiB 97 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,16,18,24,25,26,27,28,29,30,31,32,33,34,35] 185 72
24 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,10,12,13,14,15,16,17,18,19,20,21,22,23,25] 186 73
10 76 GiB 9.1 TiB 77 GiB 9.2 TiB [4,9,11,15,17,18,25,29,34,35] 1 0
27 89 GiB 9.1 TiB 90 GiB 9.2 TiB [0,1,2,3,4,5,6,10,12,13,14,15,16,17,18,19,20,21,22,23,26,28] 185 75
2 77 GiB 9.1 TiB 78 GiB 9.2 TiB [1,3,8,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 310 62
19 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,18,20,24,25,26,27,28,29,30,31,32,33,34,35] 184 77
20 77 GiB 9.1 TiB 78 GiB 9.2 TiB [0,1,2,3,4,5,6,7,8,9,10,11,19,21,24,25,26,27,28,29,30,31,32,33,34,35] 183 69
35 96 GiB 9.1 TiB 97 GiB 9.2 TiB [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,34] 187 78
9 77 GiB 9.1 TiB 78 GiB 9.2 TiB [1,8,10,12,13,16,21,23,32,35] 1 0
6 83 GiB 9.1 TiB 84 GiB 9.2 TiB [5,7,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] 323 58
12 89 GiB 9.1 TiB 90 GiB 9.2 TiB [0,1,2,3,4,5,6,8,9,10,11,13,24,25,26,27,28,29,30,31,32,33,34,35] 189 78
29 64 GiB 9.1 TiB 65 GiB 9.2 TiB [0,1,2,3,4,5,6,9,12,13,14,15,16,17,18,19,20,21,22,23,28,30] 185 74
sum 2.8 TiB 327 TiB 2.9 TiB 330 TiB
-
ceph status -> Ceph cluster over status
-
ceph osd status -> OSD status
-
ceph osd df -> OSD disk usage
-
ceph osd utilization -> OSD utilization summary of max and min
-
ceph osd pool stats -> Pools status along with IO
-
ceph osd tree -> View CRUSH map with a host, its OSDs,they are up and their weight.
-
ceph pg stat -> PG status in summary
-
ceph status || ceph -w --> Check or watch cluster health
-
ceph df -->Check ceph cluster usage stats
-
ceph pg dump --> Check placement group stats
-
ceph osd repair -> Repair an OSD
-
ceph osd pool create/delete --> Create or delete a storage pool
-
ceph tell osd. bench* -> Benchmark an OSD by default, the test writes 1 GB in total in 4-MB increments.
-
ceph osd crush reweight -> Adjust an OSD’s crush weight
-
ceph auth list -> List cluster keys
-
ceph pg {pg-id} query -> Query statistics and other metadata about a pg.
-
ceph pg {pg-id} list_missing --> List "missing/unfound" object
-
ceph pg {pg-id} mark_unfound_lost revert|delete -> To delete those objects, respective revert to previous versions of the
-
ceph pg dump_stuck inactive -> Dump stuck placement groups, if any inactive
-
ceph pg dump_stuck unclean -> Dump stuck placement groups, if any unclean
-
ceph pg dump_stuck stale -> Dump stuck placement groups, if any stale
-
ceph pg dump_stuck undersized -> Dump stuck placement groups, if any undersized
-
ceph pg dump_stuck degraded -> Dump stuck placement groups, if any degraded.
-
ceph pg scrub {pg-id} -> Instruction to do scrub on a PG
-
ceph pg deep-scrub {pg-id} -> Instruction to do deep-scrub on a PG
-
ceph pg repair {pg-id} -> Instruction to do repair a PG.
-
ceph daemon {osd.id} dump_ops_in_flight -> list of current active operations for an OSD
-
ceph daemon {osd.id} help -> List of commands the daemon supports
-
ceph daemon {mon.ceph-mon01} mon_status -> Status info for this MON
-
ceph daemon {osd|mon|radosgw} perf dump -> Shows performance statistics
https://ubuntu.com/ceph/what-is-ceph
https://docs.ceph.com/en/quincy/man/8/ceph/
https://manpages.ubuntu.com/manpages/xenial/man8/ceph.8.html
https://www.redhat.com/en/blog/10-commands-every-ceph-administrator-should-know
https://cloudopsofficial.medium.com/the-ultimate-rook-and-ceph-survival-guide-eff198a5764a
https://sabaini.at/pages/ceph-cheatsheet.html