Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Database \ Differential test piece using MongoDB performance YCSB     - Java singleton mode (Singleton mode) (Programming)

- Depth study and understanding for individual users suicide DDoS attacks (Linux)

- How to install GIMP 2.8.16 in Ubuntu 16.04,15.10,14.04 (Linux)

- NFS installation process under the CentOS (Linux)

- Ten linux version of the system (Linux)

- VirtualBox CentOS is configured as a local disk mirroring software source (Linux)

- 30 minutes with your Quick Start MySQL Tutorial (Database)

- Linux Command - ps: a snapshot of the current process (Linux)

- Create several practical points of high security PHP site (Linux)

- 10 Regulation of painless SQL Schema (Database)

- Port Telnet command to detect the remote host is turned on (Linux)

- How to use the TF / SD card making Exynos 4412 u-boot boot disk in Mac (Linux)

- Ubuntu use three methods to install Ruby (Linux)

- How to migrate MySQL to MariaDB under linux (Database)

- Boost notes --Thread - problems encountered in the initial use on Ubuntu (Programming)

- Linux common commands ll namely ls -l --color = auto (Linux)

- Oracle table Access Control (Database)

- FPM quickly create packages with RPM (Linux)

- Tmux Crash Course: Tips and adjustment (Linux)

- Ubuntu system safe and caution sudo su command (Linux)

  Differential test piece using MongoDB performance YCSB
  Add Date : 2017-03-03      
  MongoDB library-level locking

MongoDB is the most popular NoSQL database, its natural document type data structure, flexible data model and easy to use horizontal scalability and won the favor of many developers. No gold but no one is perfect, MongoDB is not without its weaknesses, such as its library-level locking is that people often complain of a performance bottleneck. In simple terms MongoDB library-level lock is for all write operations of a particular database, you must get this before the next database only a mutex situation. This sounds bad, but in fact due only for a write operation to update the data in memory retention lock the moment, so each write locks occupancy time is usually in the nanosecond level. Because of this, the practical application and database-level locks that did not produce a significant impact on the performance of people are worried.

In a few ultra-high concurrent write scenarios, library-level locks may be a bottleneck. The inside can be monitored by MongoDB's MMS DB Lock% (or mongostat command line output) indicators were observed. Under normal circumstances, if DB Lock% more than 70-80% and can be considered to have continued to the saturation. How to solve this problem?

Option One: slice
The standard answer is MongoDB If you have enough hardware resources. Fragmentation is the solution to most of the performance bottlenecks ultimate way.

Scheme Two: sub-libraries
This is a very effective alternative means. Specific approach is to put your data assigned to several different databases, then the application in the data access layer to implement a routing switch, ensure data read and write it will be directed to the appropriate database. A good example in a census database, you can build a separate library for each province. 31 to form a logical database large library. But this practice is not used at all times, such as the operation of the entire library if you need a lot of data query sort, then the result will be the coordination of multiple libraries is cumbersome or impossible.

Option Three: Wait
MongoDB 2.8 upcoming release. 2.8 The biggest change is the level of the library into a document-level locking locks. Performance problems caused by a library-level lock should be expected to be greatly improved.

Scenario Four: Derivative sheet
Defines the derivation sheet is to use MongoDB slice technology, but on the whole or part of the plurality of sheet Mongod running the same server (the server can be a physical machine or virtual machine). Due to library-level locking, and MongoDB multicore CPU utilization is not very high characteristic differential sheet under the following conditions scenario would be a nice performance tuning tools:
1) a multi-core server (4 or 8 or more) CPU
2) The server has not yet appeared IO bottleneck
3) have enough memory to hold the heat data (no frequent page faults)

In this article we look at doing some performance testing using a differential impact on future chip technology performance improvement.

YCSB performance testing tools

Before starting the test, I would like to take a moment to introduce YCSB this tool. The reason is that many times I have seen the development engineer or DBA to do the test, it tends to use some very simple tools as clients highly concurrent insert or read test. MongoDB is a high-performance database itself, where appropriate, the amount of concurrency tune of tens of thousands of cases could reach second grade. If the client code is simple and crude type, and even the use of single-threaded client, the performance bottleneck is the first test in the client itself, not the server. So choose an efficient client is an important first step in a good performance test.

YCSB Yahoo is dedicated to the development of a new generation of database benchmarking tool. Full name is Yahoo! Cloud Serving Benchmark. They developed this tool purpose is to have a standard tool used to measure the performance of different databases. YCSB done a lot of optimization to improve client performance, for example, on the most primitive data type is used to reduce the bit array data object itself create time needed for the conversion and the like. YCSB several major features:
* Support for common database read and write operations, such as insert, modify, delete, and read
* Multi-threading support. YCSB implemented in Java, a very good multi-threading support.
* Flexible definition of the scene file. You can specify the parameters of a flexible test scenarios, such as insert 100%, 50%, 50% read write, etc.
* Data Request distribution: support random, zipfian (only a small portion of the data obtained most of the access requests) and the latest data distributed in several ways request
* Scalability: You can extend Workload ways to modify or extend the functionality YCSB

Installation YCSB

Since YCSB itself will bear a heavy workload, the general recommendations YCSB deployed on a separate machine, preferably 4-8 core CPU, 8G of memory or more. YCSB and database servers to ensure a minimum bandwidth of Gigabit, 10 Gigabit is the best.
* Install JDK 1.7
* Download achieve a MongoDB-driven YCSB compiled version

* unzip
* Into ycsb directory and run (locally have a Mongo database on port 27017):
./bin/ycsb run mongodb -P workloads / workloada
* If YCSB can run the installation was successful
You can also use Git to pull down the source files to compile. It requires JDK and Maven tools. Github address is: https: //github.com/achille/YCSB can refer to this page to compile and install YCSB: https://github.com/achille/YCSB/tree/master/mongodb

YCSB scene file

YCSB test different scenarios using only need to provide a different scene files can be. YCSB property will follow your scene file is automatically generated in response to client requests. In our test where we will use this to several scenarios:
Scenario S1: 100% insert. Used to load test data
Scenario S2: Write Once Read Many less 90% 10% Reading update
Scenario S3: mixing reader to read 65%, 25% insert, update 10%
Scenario S4: reading and writing less 90% read, 10% of the insert, update,
Scene S5: 100% read

Here is a scene in which the contents of the file S2:
recordcount = 5000000
operationcount = 100000000
workload = com.yahoo.ycsb.workloads.CoreWorkload
readallfields = true
readproportion = 0.1
updateproportion = 0.9
scanproportion = 0
insertproportion = 0
requestdistribution = uniform
insertorder = hashed
fieldlength = 250
fieldcount = 8
mongodb.url = mongodb: // 27017
mongodb.writeConcern = acknowledged
threadcount = 32

Some explanation:
* Test data includes 5 million documents (recordcount)
* Each document size is approximately 2KB (fieldlength x fieldcount). Total size of the data is the index of 10G + 600M
* Url MongoDB database is
* MongoDB write security settings (mongodb.writeConcern) is acknowledged
* The number of threads is 32 (threadcount)
* Insert the document order: hash / random (insertorder)
* Update operation: 90% (0.9)
* Read: 10% (0.1)

Download all scene files (S1 - S5) (see above Linux commune Download) and unzip ycsb directory created above:

MongoDB configuration

The test is tested on AWS virtual host. The following is a server configuration:
* OS: Amazon Linux (CentOS and substantially similar)
* CPU: 8 vCPU
* RAM: 30G
* Storage: 160G SSD
* Journal: 25G EBS with 1000 PIOPS
* Log: 10G EBS with 250 IOPS
* MongoDB: 2.6.0
* Readahead: 32

Some explanations:
MongoDB data recovery log (journal), and the system log (log), respectively, with the three different storage disks. This is a common way of optimization to ensure that the write log does not affect the brush disk IO data. Also readahead server settings change to the recommended 32.

Single Benchmark

Before we test the use of differential performance piece we first need to obtain the highest performance single machine. Start the target MongoDB server, first delete the database login ycsb up (if one exists)
# Mongo
> Use ycsb
> Db.dropDatabase ()

Scenario S1: data insertion
Then start running YCSB. Ycsb advance to the next directory, run the following command (to confirm the current directory have been the scene file S1, S2, S3, S4, S5)
./bin/ycsb load mongodb -P S1 -s

If you are running normally, you'll see that every 10 seconds YCSB print of your current status, including complication rate per second and average response time. Such as:
Loading workload ...
Starting test.
0 sec: 0 operations;
mongo connection created with localhost: 27017 / ycsb
10 sec: 67169 operations; 7002.16 current ops / sec; [INSERT AverageLatency (us) = 4546.87]
20 sec: 151295 operations; 7909.24 current ops / sec; [INSERT AverageLatency (us) = 3920.9]
30 sec: 223663 operations; 7235.35 current ops / sec; [INSERT AverageLatency (us) = 4422.63]

To monitor the real-time metrics MongoDB to see whether YCSB report broadly consistent: while running you can use mongostat (MMS or a better choice).

We can see the end of the run is similar to the following output:
[OVERALL], RunTime (ms), 687134.0
[OVERALL], Throughput (ops / sec), 7295.168457372555
[INSERT], Operations, 5000000
[INSERT], AverageLatency (us), 4509.1105768
[INSERT], MinLatency (us), 126
[INSERT], MaxLatency (us), 3738063
[INSERT], 95thPercentileLatency (ms), 10
[INSERT], 99thPercentileLatency (ms), 37
[INSERT], Return = 0, 5000000

This output tells us insert 5,000,000 records, took 687 seconds, the average amount per concurrent 7295, the average response time of 4.5ms. Note that this value itself, without any reference to the MongoDB performance. If your environment has any point is inconsistent, or insert size of the data or index number is not the same, the results will lead to a big different. So this value can only be used in the present test piece and differential performance comparison reference value.

In MongoDB, we must pay special attention to what mongostat or MMS reporting page faults, network, DB Lock% and other indicators. If your network is 1Gb / s and 100m mongostat reported figures that your network is basically saturated. 1Gb / s bandwidth is 128m / s transfer rate. In my network in this test was held at 14-15m / s appearance, and the concurrency rate per second and the size of the document (7300x2KB) is consistent.

In order to find a more ideal number of client threads, I for the same operation is repeated several times, each time to modify the scene file inside threadcount value. Test results found to 30 concurrent threads about the amount reached the highest value. Further increase the number of threads no longer improve performance. Because my scene file threadcount value is set to 32.

Now we have five million test data in the database, we can now look at a few other test scenarios. Note: The first parameter is YCSB testing phase. Just import the data so the first argument is "load". After the data import stage so the next step is to run the second parameter is "run".

Scenario S2: read-write how much
./bin/ycsb run mongodb -P S2 -s
[OVERALL], Throughput (ops / sec), 12102.2928384723

Scenario S3: mixed read-write (65% read)
./bin/ycsb run mongodb -P S3 -s
[OVERALL], Throughput (ops / sec), 15982.39239483840

Scenario S4: reading and writing less
./bin/ycsb run mongodb -P S4 -s
[OVERALL], Throughput (ops / sec), 19102.39099223948

Scene S5: 100% read
./bin/ycsb run mongodb -P S5 -s
[OVERALL], Throughput (ops / sec), 49020.29394020022

Differential test piece

Just now we've got the single scene in five performance indicators. Then we can begin testing in the differential sheet and a different number of pieces of scenario differential performance.

First, we stopped on a stand-alone database MongoDB.
Next we want to build a fragmented cluster. Here let me recommend to you a very handy tool for MongoDB: mtools https://github.com/rueckstiess/mtools

mtools are a few MongoDB collection of tools, which can help us mlaunch effortlessly create replication sets or clusters on a single slice.

Install mtools (needed Python and Python package management tool pip or easy_install):
# Pip install mtools


# Easy_install mtools

Then build a new directory and create derivative sheet clusters in a new directory:
# Mkdir shard2
# Cd shard2
# Mlaunch -sharded 2 -single

This command will be created on the same machine at four processes:
* 1 mongos in Port 27017
* A configuration server port 27020 in mongod
* 2 slices server mongod port at 27018 and 27019

This process consisting of four differential piece cluster with two slices. It is worth noting that although we have set up a cluster fragmentation, at this time all the data or only go to one slice, this slice is called the master slice. Let MongoDB data distributed to each slice, you must explicitly activate the database and collection name to be fragmented.
# Mongo
mongos> sh.enableSharding ( "ycsb")
{ "Ok": 1}
mongos> sh.shardCollection ( "ycsb.usertable", {_id: "hashed"})
{ "Collectionsharded": "ycsb.usertable", "ok": 1}

Both commands are activated fragmentation "ycsb" database and library "usertable" collection. Slice open the collection you need to specify when the shard key. Here we use the {_id: "hashed"} represents hash value _id field as the shard key. Shard key hashes to write a lot of scenes more appropriate, can write evenly distributed on each slice.

Then we can run the following five scenarios in order to collect and test results (Note that the first parameter of ycsb):
./bin/ycsb load mongodb -P S1 -s
./bin/ycsb run mongodb -P S2 -s
./bin/ycsb run mongodb -P S3 -s
./bin/ycsb run mongodb -P S4 -s
./bin/ycsb run mongodb -P S5 -s

After you finish testing use the following command to turn off the entire cluster:
# Mlaunch stop

Inferior race analogy, can be 4, 6, 8, and derivative chip cluster members were established separate directory and repeat the test five scenes.

in conclusion

From the table, we can draw the following conclusions

* Differential sheet under appropriate scenarios can significantly increase the amount of concurrency MongoDB

* Differential sheet for read-only scenarios without any help

* Differential pieces mixed read-write scenes (and in practice the most common scenario) to optimize the best: 275%

* 6 differential sheet has been basically to saturation, and then add more fragments have been no significant improvement. This number may vary.
- CentOS 7 install Hadoop-cdh-2.6 (Server)
- Tomcat itself through simple movement separation (Server)
- To help you easily protect the Linux System (Linux)
- How to remove the files inside the privacy of data on Linux (Linux)
- sed and awk in shell usage and some examples (Linux)
- Linux system started to learn: the Linux syslog (Linux)
- Compiled version of Android Opus audio codec library method (Programming)
- Digital jQuery scrolling effect (Programming)
- To use Linux to create a secure managed gateway (Linux)
- The several technical presentation Raid under Linux (Linux)
- Linux C programming and Shell Programming in the development of practical tools similarities summary (Programming)
- Linux how to handle file names that contain spaces and special characters (Linux)
- MongoDB polymerization being given (Database)
- Java concurrent programming combat (using synchronized synchronization method) (Programming)
- Linux Shell Understanding and Learning (Linux)
- How to modify the Emacs Major Mode Shortcuts (Linux)
- Linux system Iptables Firewall User Manual (Linux)
- Oracle SQL statement to retrieve data paging table (Database)
- innodb storage engine backup tool --Xtrabackup (Database)
- PCM audio under Linux (Linux)
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.