Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Ceph cluster disk is no workaround for the remaining space     - Java development specifications summary (Programming)

- Python cause yum upgrade error (Linux)

- Java environment to build a number of issues (Linux)

- Android custom ViewPager create kaleidoscopic image transition effects (Programming)

- Using the Android interface in Parcelable (Programming)

- CentOS 6.5 installation VNCServer implement graphical access (Server)

- systemd run levels and service management command Introduction (Linux)

- HashMap in Android and Java different implementations (Programming)

- TWiki LDAP error appears the problem is solved (Linux)

- 24 Docker recommendations (Linux)

- To delete the directory and all specified files under the Mac (Linux)

- Oracle multi-table query optimization (Database)

- The headers for the current running kernel were not found when VirtualBox installation enhancements (Linux)

- Using shell users or virtual users to login to pureftpd (Linux)

- How to use awk command in Linux (Linux)

- Log4cplus logging facility configuration, installation, testing (Linux)

- Git and GitHub use of Eclipse and Android Studio (Programming)

- Linux file compression and file system packaged with instructions (Linux)

- Linux boot process (Linux)

- Sturdy build secure Linux server (Linux)

 
         
  Ceph cluster disk is no workaround for the remaining space
     
  Add Date : 2017-04-13      
         
         
         
  Fault description

OpenStack + Ceph clusters in use, because the virtual machine copy-number of new data, resulting in rapid consumption of the cluster disks, there is no free space, the virtual machine can not be operated, Ceph cluster all operations can not be performed.

Symptom

Try using OpenStack restart the virtual machine is invalid
Try directly rbd command to directly remove the block failed
[Root @ controller ~] # rbd -p volumes rm volume-c55fd052-212d-4107-a2ac-cf53bfc049be
2015-04-29 05: 31: 31.719478 7f5fb82f7760 0 client.4781741.objecter FULL, paused modify 0xe9a9e0 tid 6
See ceph health
cluster 059f27e8-a23f-4587-9033-3e3679d03b31
 health HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs stuck unclean; recovery 7482/129081 objects degraded (5.796%); 2 full osd (s); 1 near full osd (s)
 monmap e6: 4 mons at {node-5e40.cloud.com = 10.10.20.40: 6789/0, node-6670.cloud.com = 10.10.20.31: 6789/0, node-66c4.cloud.com = 10.10.20.36 : 6789/0, node-fb27.cloud.com = 10.10.20.41: 6789/0}, election epoch 886, quorum 0,1,2,3 node-6670.cloud.com, node-66c4.cloud.com, node-5e40.cloud.com, node-fb27.cloud.com
 osdmap e2743: 3 osds: 3 up, 3 in
        flags full
  pgmap v6564199: 320 pgs, 4 pools, 262 GB data, 43027 objects
        786 GB used, 47785 MB / 833 GB avail
        7482/129081 objects degraded (5.796%)
             300 active + clean
              20 active + degraded + remapped + backfill_toofull

HEALTH_ERR 20 pgs backfill_toofull; 20 pgs degraded; 20 pgs stuck unclean; recovery 7482/129081 objects degraded (5.796%); 2 full osd (s); 1 near full osd (s)
pg 3.8 is stuck unclean for 7067109.597691, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.7d is stuck unclean for 1852078.505139, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.21 is stuck unclean for 7072842.637848, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.22 is stuck unclean for 7070880.213397, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.a is stuck unclean for 7067057.863562, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.7f is stuck unclean for 7067122.493746, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.5 is stuck unclean for 7067088.369629, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.1e is stuck unclean for 7073386.246281, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.19 is stuck unclean for 7068035.310269, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.5d is stuck unclean for 1852078.505949, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.1a is stuck unclean for 7067088.429544, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.1b is stuck unclean for 7072773.771385, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.3 is stuck unclean for 7067057.864514, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.15 is stuck unclean for 7067088.825483, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.11 is stuck unclean for 7067057.862408, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.6d is stuck unclean for 7067083.634454, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.6e is stuck unclean for 7067098.452576, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.c is stuck unclean for 5658116.678331, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.e is stuck unclean for 7067078.646953, current state active + degraded + remapped + backfill_toofull, last acting [2,0]
pg 3.20 is stuck unclean for 7067140.530849, current state active + degraded + remapped + backfill_toofull, last acting [0,2]
pg 3.7d is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.7f is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.6d is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.6e is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.5d is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.20 is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.21 is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.22 is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.1e is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.19 is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.1a is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.1b is active + degraded + remapped + backfill_toofull, acting [0,2]
pg 3.15 is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.11 is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.c is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.e is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.8 is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.a is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.5 is active + degraded + remapped + backfill_toofull, acting [2,0]
pg 3.3 is active + degraded + remapped + backfill_toofull, acting [2,0]
recovery 7482/129081 objects degraded (5.796%)
osd.0 is full at 95%
osd.2 is full at 95%
osd.1 is near full at 93%
Solution a (verified)

After the increase in OSD node, which is the official document recommended practice, adding new nodes, Ceph began to rebalance the data, use of space began to decline OSD

2015-04-29 06: 51: 58.623262 osd.1 [WRN] OSD near full (91%)
2015-04-29 06: 52: 01.500813 osd.2 [WRN] OSD near full (92%)
Solution two (in theory, not validated)

If in the absence of new hard case, but to use another way. In the current state, Ceph not allow any read or write operation, so in this case is not so that any Ceph order, solution to the problem is to try to reduce the ratio defined Ceph full, we can see from the above Ceph log the proportion is 95% full, we need to do is to increase the proportion of full, after attempting to delete the data as soon as possible, the ratio will fall.

Try the direct command set, but failed, Ceph cluster does not re-synchronize data suspect may still need to restart the service itself
ceph mon tell \ * injectargs '--mon-osd-full-ratio 0.98'
Modify the configuration file, restart monitor services, but worried about the problem, so do not dare to try this method, follow-through on the mailing list to confirm that this method should not have an impact on the data, but only if during recovery, all virtual machines do not Ceph again to write any data.
By default, the full ratio is 95%, while the proportion of near full was 85%, so this configuration needs to be adjusted according to the actual situation.

[Global]
    mon osd full ratio = .98
    mon osd nearfull ratio = .80
1
analysis Summary

the reason

Ceph according to official documents as described in the OSD full when a proportion of 95%, the cluster will not accept any request for Ceph Client-side reading and writing data. Thus resulting in the case of a virtual machine at the time of the restart, you can not start.

Solution

Recommendation from the official point of view, should be more support for adding new ways OSD, of course, increase the proportion of a temporary solution, but not recommended, because of the need to manually delete the data to solve, and once again there is a new node fault, there will still be proportional situation becomes full, it is the best way to solve the expansion.

Think

In this process failure, two points are worth considering:

Monitoring: At that time, the server configuration process DNS configuration error, resulting in normal monitor e-mail can not be issued, and thus did not receive the message Ceph WARN.
Cloud platform itself: As the mechanism Ceph in OpenStack platform allocation, most of the time is the super-sub from the user's perspective, the behavior of a large number of copies of the data and not wrong, but because of the cloud platform, and do not have the early warning mechanism , causing the problem from occurring.
     
         
         
         
  More:      
 
- Android Studio simple setup (Linux)
- Scope of variables in Object-C (Programming)
- Getting Started with Linux system to learn: how to check in a package is installed on Ubuntu (Linux)
- MySQL management partition table (Database)
- Linux Network Security Tips Share (Linux)
- Linux Platform Oracle 11g Single Instance Installation Deployment Configuration Quick Reference (Database)
- Linux user login and IP restrictions (Linux)
- Ubuntu Backup and Recovery (Linux)
- Install the Solaris 10 operating system environment over the network to sparc (Linux)
- Shell scripts quickly deploy Tomcat project (Server)
- Introduction and MongoDB CRUD (Database)
- IronPython and C # to interact (Programming)
- MySQL stored procedures and triggers (Database)
- MySQL5.7 JSON type using presentation (Database)
- Oracle Migration partition table (Database)
- CentOS5 installation Nodejs (Linux)
- To delete the directory and all specified files under the Mac (Linux)
- Linux Getting Started tutorial: How to backup Linux systems (Linux)
- Installation of Gitlab under Ubuntu (Linux)
- Linux System Getting Started Learning: The Linux log (Linux)
     
           
     
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.