Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Several Ceph performance optimization of new methods and ideas (2015 SH Ceph Day after flu reference)     - Git Installation and Configuration (Network Agent settings) (Linux)

- Three minutes to teach you to easily grasp the grep command regular expression (Linux)

- How common Linux automation tasks (Server)

- Installation and configuration under Linux SVN server - Backup - Recovery - Scheduled Tasks (Server)

- JavaScript event handling Detailed (Programming)

- CentOS 6.3 compile and install LNMP environment (Server)

- Prevent security threats caused Rootkit (Linux)

- Let Linux boot methods to enter characters interface and set FrameBuffer resolution methods (Linux)

- Linux system started to learn: how to solve the xxx is not in the sudoers file error (Linux)

- Distributed File System using MogileFS (Linux)

- Linux file system (inode and block) (Linux)

- How to achieve SSH without password login (Server)

- CentOS / Debian configuration Gitlab 7.1x to build self Git repository (Linux)

- How to defragment the hard disk in Linux (Linux)

- Java memory model subclasses (Programming)

- MYSQL root password for the database user passwords are weak attack analysis (Linux)

- Linux static library generated Guide (Programming)

- Ubuntu users to install voice switch instructs the applet (Linux)

- Linux system on how to use rsync to synchronize data (Server)

- Java in the inverter and covariance (Programming)

  Several Ceph performance optimization of new methods and ideas (2015 SH Ceph Day after flu reference)
  Add Date : 2017-04-13      
  A week ago, jointly organized by Intel and RedHat the Shanghai Ceph Day on October 18. At the meeting, a number of experts to do more than a dozen very wonderful speech. In this paper, knowledge and methods Ceph performance optimization of those mentioned in the speech, trying to sum up on their own understanding.

0. conventional Ceph performance optimization

(1) hardware level

Hardware Planning: CPU, memory, network
SSD options: using the SSD as a log storage
BIOS settings: Open Hyper-Threading (HT), turn off energy-saving, and so close NUMA
(2) software level

Linux OS: MTU, read_ahead etc.
Ceph Configurations and PG Number adjustments: the formula (Total PGs = (Total_number_of_OSD * 100) / max_replication_count) calculated using the PG.
For more information, refer to the following article:

Ceph performance optimization summary (v0.94)
Measure Ceph RBD performance in a quantitative way 1,2
Ceph tuning --Journal and tcmalloc
Ceph Benchmarks
1. Use hierarchical caching layer - Tiered Cache

  Obviously this is not a new feature of Ceph, at the meeting of experts in this field have described in detail the principles and use of this feature, as well as details of the error correction code incorporated.

Brief summary:

Each cache hierarchy (tiered cache) using a RADOS pool, wherein the cache pool to be copied (to replicated) type, but may be a copy backing pool type can also be the type of error correcting code.
In different cache levels, using different hardware media, the media speed media cache pool used must use than backing pool fast: for example, in backing pool using a general storage medium, such as a conventional HDD or SATA SDD; use cache pool fast medium, such as PCIe SDD.
Each tiered cache uses its own CRUSH rules, so that the data will be written to the different storage media.
librados internal support tiered cache, in most cases it will know the client's data needs to be placed on which layer, there is no need to make changes in the RDB, CephFS, RGW client.
OSD independently handle the flow of data between two levels: promotion (HDD-> SDD) and eviction (SDD -> HDD), however, that the flow of data is expensive (expensive) and time consuming (take long time to "warm up").
2. Better SSD - Intel NVM Express (NVMe) SSD

     In Ceph cluster, often use SSD as a Journal (logs) and Caching (cache) media, to improve the performance of the cluster. Below, the use of SSD as the Journal of the cluster than 64K HDD cluster-wide order writing speed increased 1.5 times, while 4K random write speed increased by 32 times.

The Journal and OSD using SSD separated both use the same piece of SSD, can also improve performance. The following figure, both on the same SATA SSD, the performance score to open two SSD (Journal using a PCIe SSD, OSD using SATA SSD), 64K sequential write speeds dropped by 40%, while 4K random write speeds dropped by 13% .

Therefore, a more advanced SSD naturally more improved performance Ceph cluster. SSD development to the present, the media (particles) substantially through three generations, the natural generation is more advanced than a generation, specifically in the higher density (larger capacity) and read and write data faster. Currently, the most advanced is the Intel NVMe SSD, it is characterized as follows:

PCI-e to drive customized standardized software interfaces
Customized for the SSD (PCIe else is done)
SSD Journal: HDD OSD ratio from the conventional 1: 5 to 1:20 raise
The whole SSD cluster, the whole NVMe SSD disk Ceph cluster naturally the best performance, but its cost is too high, and the performance is often limited by the NIC / network bandwidth; so the whole SSD environment, the recommended configuration is to use NVMe SSD Journal do use conventional disk SSD to OSD.
Meanwhile, Intel SSD can also combine Intel Cache Acceleration Software software that can intelligently according to the characteristics of the data, the data on your SSD or HDD:


Test Configuration: Intel NVMe SSD do Cache, use Intel CAS Linux 3.0 with hinting feature (will be released later this year)
Test results: 5% of the cache, so that the throughput (ThroughOutput) submitted doubled delay (Latency) halved
3. Better use of network equipment - Mellanox cards and switches, etc.

3.1 higher bandwidth, lower latency network card device

    Mellanox is a company based in Israel, approximately 1,900 employees worldwide, focusing on high-end network equipment, 2014, revenue was 463.6M. (Today just to see the treatment of the company's branch in China is also very good on Mizuki BBS). The main ideas and products:

Scale Out feature Ceph requirements for replicaiton, more sharing and metadata (file) network throughput, lower latency
Currently 10 GbE (Gigabit Ethernet) can no longer meet the requirements of high-performance Ceph cluster (SSD substantially more than 20 clusters can not meet), it has begun to enter 25, 50, 100 GbE era. Currently, 25GbE relatively high cost.
Most network equipment company is using Qualcomm chips, and Mellanox using self-developed chips, the delay (latency) is the industry's lowest (220ns)
Ceph cluster need to use two high-speed network: public network for client access, Cluster network for heartbeat, replication, recovery and re-balancing.
Ceph cluster is currently widely used in SSD, and fast storage devices will need faster network equipment
The actual test:

(1) Test environment: Cluster network using 40GbE switch, Public network distribution equipment using 10 GbE and 40GbE do comparison

(2) Test Results: The results showed that the use of 40GbE equipment throughput cluster is 2.5 times the 10 GbE cluster, IOPS are increased by 15%.

    Currently, there are already some companies use the company's network equipment to produce a full SSD Ceph server, for example, SanDisk's InfiniFlash on the use of the company's 40GbE NIC, two Dell R720 server as OSD node, 512 TB SSD, it The total throughput of 71.6 Gb / s, as well as Fujitsu and Monash University.

3.2 RDMA technology

    Traditionally, the need to access the hard disk to store tens of milliseconds, and the network protocol stack, and a few hundred subtle. During this period, often using 1Gb / s network bandwidth, use the SCSI protocol to access local storage using iSCSI remote storage access. And after use SSD, time-consuming to access local storage dropped to a few hundred microseconds, therefore, if the network and protocol stack does not raise the same, then they will become a performance bottleneck. This means that the need for better network bandwidth, such as 40Gb / s or even 100Gb / s; still using iSCSI remote storage access, but TCP was not enough, when RDMA technology is introduced. RDMA stands for Remote Direct Memory Access, in order to solve the server-side network transmission latency data processing generated. It is through the network data directly into the computer's memory, it will quickly move data from one system to the remote system memory, without any impact on the operating system, so you do not need much processing power of the computer used. It eliminates external memory copy operation and the exchange of text, which can free up space for bus and CPU cycles used to improve application performance. General practice required by the system to analyze incoming information and mark, and then stored in the correct area.

 This technology, Mellanox is the industry leader. Through Bypass Kenerl and Protocol Offload implementation, providing high bandwidth, low latency and low CPU usage. Currently, the company achieved in the Ceph in the XioMessager, so Ceph message does not go away TCP and RDMA, so that it can improve the performance of the cluster, this implementation provides Ceph Hammer version.

4. Use a better software - Intel SPDK related technologies

4.1 Mid-Tier Cache scheme

    The program between the client application and the Ceph cluster add a caching layer, so that the client access performance is improved. This layer is characterized by:

Ceph client to provide iSCSI / NVMF / NFS protocols support;
Using two or more nodes to improve reliability;
Added Cache, faster access
Use write log to ensure data consistency across multiple nodes
Ceph RBD connect to the backend using cluster

4.2 Use Intel DPDK and UNS technical

 Intel Using this technique, the user space (user space) to achieve a full DPDK card and driver, TCP / IP protocol stack (UNS), iSCSI Target, and NVMe drive to improve the performance of iSCSI access the Ceph. benefit:

Compared with the Linux * -IO Target (LIO), which is only 1/7 of CPU overhead.
User space kernel space NVMe drive ratio VNMe drive CPU utilization to 90% less
   A major feature of the program is to use the user mode network card, network card in order to avoid conflicts and kernel mode in the actual configuration, you can SRIOV technology, a virtual multiple virtual physical NIC card, assigned to applications such as OSD. By using the complete user mode technology, to avoid dependence on the kernel version.

  Currently, Intel offers Intel DPDK, UNS, Storage stack optimized reference programs, need to use the words and Intel signed a use agreement. NVMe user mode driver is already open.

4.3 CPU data storage acceleration - ISA-L technology

    The code library (code libaray) using Intel E5-2600 / 2400 and the new instruction set Atom C2000 product family CPU to achieve the appropriate algorithm to maximize the use of CPU, greatly improving data access speed, however, currently only supports single-core X64 Zhiqiang and Atom CPU. In the following examples, EC several times the speed of increase, overall costs reduced by 25 to 30 percent.

5. The tools and methods to use the system - Ceph performance testing and tuning tools summary

The meeting also issued a number of Ceph performance testing and tuning tools.

5.1 Intel CeTune

Intel this tool can be used to deploy, test, analysis and tuning (deploy, benchmark, analyze and tuning) Ceph cluster, now it has been open source code here. Key features include:

Users can configure the CeTune, using its WebUI
Deployment module: using CeTune Cli or GUI deployment Ceph
Performance test module: support qemurbd, fiorbd, cosbench and so do performance testing
Analysis module: iostat, sar, interrupt, performance counter and other analysis tools
Report Views: support configuration download, icon view
5.2 Common performance testing and tuning tools

Ceph software stack (possible points of failure and tune performance advantages):

Visibility performance tools summary:

Benchmarking tools Summary:

Tuning Tools Summary:

6. Evaluation

    Several methods above, compared with the traditional method of performance optimization, and some have their innovation, which,

Better hardware, including SSD and network devices can naturally lead to better performance, but also a corresponding increase in cost and performance optimization has brought amplitude inconsistency, therefore, between the need scenarios, costs, optimize the effect do tradeoff;
Better software, currently mostly not yet open, but mostly still in beta state, still away from use in a production environment, and are closely tied and Intel hardware;
A more comprehensive approach, it is the majority of Ceph professionals need to conscientiously study and to use in normal use can be more efficiently locate performance issues and find solutions;
Intel investment in Ceph is very large, if customers have Ceph cluster performance problems, the relevant data can also be sent to them, they will provide recommendations accordingly.
Note: All the above are derived from this meeting will be presented as well as information sent after. In this release, if the content is inappropriate, please contact me. Thanks again Intel and RedHat organizing this meeting.
- Use Markdown editor for document work under Linux (Linux)
- C # using the HttpClient Note: Preheat the long connection (Programming)
- Installation on the way to the root source Ubuntu Server 14.04 LTS version Odoo8.0 (Server)
- Detailed steps - GAMIT solver (Linux)
- Tmux Getting Start (Linux)
- Python Django direct implementation of sql statement (Programming)
- How to forcibly change the Linux system password on a regular basis (Linux)
- OpenSSL: implementation creates a private CA, certificate signing request Explanation (Server)
- Linux System Getting Started Learning: Disable HTTP forwarding wget in (Linux)
- Let your PHP 7 faster the Hugepage (Linux)
- Use the top command (Linux)
- Nginx log cutting and MySQL script regular backup script (Server)
- Linux server operating system security configuration (Linux)
- GAMIT10.5 install and update failed Solution (Linux)
- MongoDB3.0.x version of the user authorization profile (stand-alone environment) (Database)
- Installation Strongswan: on a Linux IPsec-based VPN tool (Linux)
- Spring MVC Exception Handling (Programming)
- Linux virtual machine how to access the Internet in a virtual machine when using NAT mode (Linux)
- Ubuntu 14.04 / Linux Mint 17 How to install the MintMenu 5.5.2 menu (Linux)
- Notebook computer forget password solution (Linux)
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.