TCP since 1974 was invented, after 30 years of development, has now become the most important Internet-based protocols. Under the wired network environment, TCP performance even more powerful, but in the mobile Internet and networking environment, a little behave slightly less.
Mobile Internet prominent characteristic instability: jitters, the network connection is unstable. Although the development of 4G, mobile phone network bandwidth has increased, but its flow characteristics, the signal is not so stable: while long distance bus ride, or take the suburban railway, the environment, the reality surrounding environment is very complex or time-intensive Internet and so on.
The following discussion is based on Linux server environment, it is assumed environment for the mobile Internet environment. I know some of the deficiencies currently recording the TCP, be biased, please correct me give.
one. Three-way handshake
Before deciding to pass data to three-way handshake, with some surplus, the industry proposed TCP Fast Open (TFO) extension mechanism, the two shook hands after normal business can send the data. But this requires the client and server-side kernel-level support for the job: Linux kernel 3.6 client, the server supports 3.7.
two. Slow Start
A HTTP request, the application sends data larger HTML page, takes a number of round-trip cycle time (Round-Trip Time), the congestion window to be able to be extended to the maximum value for the intermediate process is quite redundant. This parameter is directly related to system throughput, throughput, the system delay small. But set how big to make a choice based on business needs.
3.0 kernel initialization before the congestion window (initcwnd) size 3. An established transitive three MSS initial transmission data connection, if an MSS of 1400 so disposable transfer 4K data, if it is 10, one-time data transfer 13K.
Google through research, we recommend mobile Internet WEB environmental recommendations initcwnd set to 10, after the linux kernel version 3.0 the default value is 10. Encounter older kernel needs to be set manually.
If a LAN environment or large data transmission requirements similar documents, can be considered appropriate to relax some.
If you long after the connection establishment message transmission are small, less than each transfer binary 4K, then start slow changes or not is irrelevant things.
three. Head of line blocking (Head-of-line blocking, HOL)
TCP protocol data transmission requires sequential transmission, can be understood as FIFO FIFO queue, after the current surface data loss, subsequent data unit can only wait, unless you have lost data is retransmitted and confirmed after receiving subsequent packets will be It is delivered to the client device, which is called the thread (HOL, head-of-line blocking) blocked. More waste of server bandwidth and reduced system performance, is not efficient.
1. multiplexing is not ideal
HTTP / 2 proposed operational level multiplexing, although to a certain extent, solve the HTTP / 1. * one-way transmission problems, but still relies on TCP itself subject to head of line blocking defects. Upper layer protocol built on TCP multiplexing, head of line blocking event occurs, we need to be careful to treat multiple business data transmission failures.
2. TCP Keepalive mechanism fails
Theoretically keep alive the TCP Keepalive extension mechanism, blocking in the event thread when sending out not to be have been blocked completely ineffective.
Similar to the NFS file system, generally use two-way TCP Keepalive keep-alive mechanism to avoid head of line blocking occurs due to a certain end result Keepalive invalid problems promptly end the perception survival.
3. The head of line blocking timeout tips
Packet, and start receiving the acknowledgment timer will retransmit after a timeout, retransmission still no confirmation, follow-up data has been accumulated in the queue to be transmitted, there will be a blocking timeout algorithm is very complicated. The upper application will receive the report from the kernel stack "No route to host" error message, default is not more than 16 minutes. (Absence of heartbeat support business case) on the server before sending the data to the terminal forced break, combined with the way TCPDUMP cut package, and so on for about 15 minutes kernel warning "EHOSTUNREACH" error, the application level you can see the "No route to host" notifications.
four. Four Baishou
After the connection is successfully established at both ends, we need to close, need to interact to produce four times, which in the mobile Internet environment, it is somewhat redundant. Fast closing, fast response, resulting in redundant interactive network bandwidth is occupied.
Fives. Confirm notification mechanism to the upper application?
This is a relatively good wishes, the upper layer application calls the kernel interface to send a large segment of data, the kernel has finished sending and receiving confirmation complete each other, and then notifies the upper application has been sent successfully, then in some circumstances, can save a lot of business level interactive step.
six. NAT Gateway Timeout
IPV4 limited LAN environment by means of NAT routing device expands the number of access terminal devices. When establishing a TCP long connections, NAT devices need to maintain an internal terminal connected to an external server uses an internal IP: PORT out with the IP: PORT mapping correspondence. This relationship requires maintenance, consuming memory resources, there is a timeout timer to clean, otherwise it will lead to memory explode.
Different NAT device timeout value is not the same, hence the need for heart assist, ensure through NAT device connection remains avoided because too much time has been kicked off. For example, for the Chinese mobile network connection persistence time is generally set to not more than five minutes. Slightly different variety of network, the introduction of intelligent heartbeat mechanism is appropriate.
Seven. Terminal IP Roaming
Mobile terminals often between 2G / 3G / 4G and WIFI switch, leading to IP addresses frequently change. The consequences of this is the existing network request - response was abandoned and terminated require manual intervention or re-initiate the request, there is a waste of resources.
Support Multipath TCP terminal devices can use 2G / 3G / 4G and WiFi connection established Mutlpath by optimizing multi-point network to download and back up each other. The case can be solved multiple networks coexist, a global network outage will not cause an interrupt request processing, has been enhanced in connection reliability and stability of the device.
Of course, between a plurality of network servers can also use Multipath TCP enhanced network throughput.
At present, only IOS 7 and later support
Linux kernel can see the figure of 3.10 experimental branch of its support, but when merged into the main branch, temporarily unknown
Eight. TCP buffer expansion
When the router receives a data packet queue length beyond its generally random packet loss, to reduce swelling. For the upper application, the delay increases, mistaken or loss of data, or the connection is lost and so on.
In such cases, the general recommendations fast contracting to avoid data loss part. Kernel-level upgrade to the latest version this morning, can not be less than 3.6.
nine. TCP is not infallible
IP and TCP head will have a check sum and error checking mechanism, 16, said anti-code and the result is negated, specifically refer to the TCP checksum and the principles and implementation. General error can be easily detected, but encountered two 16-bit numbers together results unchanged for nothing the
Ethernet frame CRC32 checksum generally are OK, but may experience isolation at both ends of the plurality of routers cases, there may be a problem
Routers may occasionally occur Hardware / Memory malfunction receive IP packets appear more bit / bit single-byte reversal or exchange, if the reverse occurs in the payload area, you can not link layer, network layer, transport layer check sum check out, check sum can only be detected by the application layer. It is recommended that the application layer to try to add data validation capabilities.
Large file download add checksums to ensure data integrity, the general use of MD5, also used to prevent tampering with security
In this environment all over the world are TCP, TCP in order to undergo a major revamp this is unlikely, because it has been cured to the existing system kernel and firmware. Such as upgrading the terminal (such as Android / IOS, etc.) systems / firmware, Linux server kernel, middleware / intermediary devices (such as routers, etc.), this is a huge project, the moment is not realistic.
TCP layer in the system kernel, upgrade the kernel space, repair, most trouble. Fortunately, that some of the server upgrade, upgrade the user terminal is called a system difficult. User space / users to upgrade the core application, the transformation ratio is relatively strong controllability, based on this Google Experts directly on the UDP protocol to build and run in user space QUIC protocol, a combination of lightweight UDP and TCP reliability is a relatively new direction.
If the underlying transport protocol to be expected after the words:
Can be customized protocols appear in the user space (user-nuclear), similar to QUIC
The traditional TCP / UDP can run in user space, skip directly to the kernel
Complete protocol stack to form a static link library to the upper application
The upper application protocol stack can contain it depends static link library file so the compiler when packaged
dpdk / netmap etc. Packet IO space frame + user protocol stack, data from the card sent directly to the upper application
Reduce the importance of the Linux kernel, regular system maintenance SSH
Although TCP there, that kind of problem, but is still unable to bypass the network infrastructure, but slightly understand some of the deficiencies in the current situation may be helpful to us currently in use will.