I used the MPI and C # thread pool, participated in the competition, have feelings, almost a year, the interest in multi-threaded programming has been diminished, has been concerned about, I decided to write an article, summed up the knowledge of it. He has said the wrong place, welcome to correct me big brother :)
The core number is still more and more, according to Moore's Law, since a single core performance have serious bottlenecks, ordinary desktop PC is expected to reach 24 core in late 2017 early 2018 (or 16-core 32 thread), how do we face this sudden increase in the number of heart? Programming also with the times. I venture to predict, each core chip bus between the CPU will use 4-way set associative :), because the whole complex is connected too, but not enough to force a single bus. And it should be non-symmetric multi-core processors, which may be mixed several DSP processor or stream processors.
2. The difference between multi-threaded parallel computing
(1) multi-threaded role is not just used for parallel computing, he still has a lot of very useful role.
Still single nuclear age, there is a very wide range of multi-threaded applications, multi-thread this time mostly for reducing blocking (meaning similar
if (flag == 1)
This code) to bring the idle CPU resources, note there is no waste of CPU resources, remove the sleep (1) it is pure waste.
Blocking occurs when? Typically waiting for IO operations (disk, databases, networks, etc.). At this point if a single-threaded, CPU will run dry concrete deeds (nothing to do with this program is considered a concrete deeds, because the execution of other programs no sense to me), inefficient (for the purposes of this program), such as a IO operation should take 10 milliseconds, CPU will be blocked close to 10 ms, which is what a waste ah! To know the number of the CPU is to live nanoseconds.
So this operation is time-consuming IO Thread one thread to perform on his behalf, to create the thread function (code) will not be part of the blocking IO operations, continue to do other things in this process, rather than waiting for a dry (or to perform other programs).
Also in this single nuclear age, the elimination of the role of blocking multithreaded also called "Concurrent", which are parallel and essentially different of. Concurrency is a "pseudo-parallel", seemingly parallel, in fact, still a CPU executing everything, just switch too fast, we could not detect nothing. For example, UI-based program (the saying goes, is the graphical interface), if you point a button event triggers need to be performed for 10 seconds, then the program will be suspended animation, because the program execution busy, no empty ignores other operation by the user; and If you put this button to trigger the function assigned to a thread, and then start the thread to perform, then the program will not die away, continue operations corresponding to other users. However, the attendant is the thread mutex and synchronization, deadlocks and other issues, detailed in the relevant literature.
Now is the era of multi-core, and this thread mutual exclusion and synchronization problems are more severe, mostly single-core era considered concurrent, multi-core era really very different, and why? For details, refer to the literature. I am here to explain briefly, before using volatile variables can solve most of the problems, such as access to a common multiple threads Flag flag, if it is a single-core concurrency, basically no problem (PS under what circumstances it would be wrong ? flag has a plurality, or an array, this time only by logic means get this problem, and more than a few idle does not matter, not a fatal problem on the line), because there is only one CPU, but can only access the flag there is a thread, multi-core situation is not the same, so the only volatile unlikely to solve the problem, which use the specific language, the specific environment of the "semaphore" a, Mutex, Monitor, Lock, etc. these class operation "off break" hardware, to "primitive" effect, access to the critical section is not interrupted by the effect is not explained specifically, readers can look at the "modern operating system."
(2) parallel computing can also be obtained through other means, but only one of them multi-threaded.
Other means include: multi-process (which in turn includes shared memory and distributed multi-machine, and mixed type), instruction-level parallelism.
ILP (instruction-level parallelism), x86 architecture was called SMT (simultaneous multithreading), the MIPS architecture in the corresponding is super scalar (superscalar) and out of order, the two are different, but all have in common can be achieved instruction-level parallelism, which is the user can not control, and does not belong to the scope of programming, optimization can only do a limited, but this limited optimization can only belong to the scope of the jurisdiction of the compiler, the user can do very little.
Language (3) Typical suitable for parallel computing
Erlang and MPI: This is the language of the former two, the latter is C ++ and Fortran extensions, the effect is the same, the use of multiple processes to achieve parallel computing, Erlang is a shared storage area, MPI is a hybrid.
C # .NET4.0: The new version 4.0 you can use a small amount of code in parallel For loop, the previous version required a very complicated code to achieve the same functionality. This is the use of multithreading parallel computing. Java and C # 3.5 has a thread pool (ThreadPool), is also a good very good use of multithreading management class, convenient and efficient to use multiple threads.
CUDA, still not too much, there is great potential for development, but for now its applications are limited. It can only use the C language, and not the C99, relatively low, you can not use a function pointer. This personal feeling due to hardware limitations on natural (average per core available memory is small, and the communication system memory for a long time), applies only to do scientific calculations, static image processing, video codec, in other areas, not as good as high-end CPU. So after the GPU operating system can fully scheduling GPU resources, the GPU can be when the big God. Game physics acceleration, in fact, multi-core CPU can be well done.
other languages. . . Yep. . Reserved for future discussion.
3. Thread the better it? And when we need to use multiple threads?
Not necessarily the better threads, thread switching overhead is to be, when you add a thread, the increased overhead is less than the thread can eliminate the blocking time, which can be called good value for money.
Since the Linux 2.6 kernel, it will put different threads to different core to handle. Windows from NT.4.0 started to support this feature.
When to use multithreading it? To discuss this sub-four cases:
a. CPU-- multicore compute-intensive tasks. This time to make use of multiple threads can improve the efficiency of task execution, such as encryption and decryption, compression and decompression of data (video, audio and general data), or only make a full core, while the other core is idle.
b. mononuclear CPU-- compute-intensive tasks. At this time the task has been 100% CPU resources consumed, there is no need and can not use multiple threads to improve computational efficiency; the contrary, if to do human-computer interaction, it is best to use multiple threads, a user can not avoid computer operation.
c single-core CPU -. IO-intensive tasks, using multiple threads or to facilitate human-computer interaction,
d multi-core CPU -. IO-intensive tasks, which needless to say, with the single-core time for the same reason.
4. The programmer needs to have skills / technology
(1) reducing the serialization code to improve efficiency. This is nonsense.
(2) a single shared data distribution: the data replication to a lot of parts, so that different threads can access simultaneously.
(3) load balancing, is divided into two kinds of static and dynamic. See specific literature.