Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Programming \ Machine Learning: Classification of the curse of dimensionality     - Getting Started with Linux system to learn: how to install autossh (Linux)

- GitLab remote backup of Linux Shell Scripting (Linux)

- Linux boot process (Linux)

- Good wireless network security information spread in the air (Linux)

- Oracle Linux 7.1 install Oracle 12C RAC (Database)

- How to fix fatal error: security / pam_modules.h: No such file or directory (Linux)

- Java development environment to build under Ubuntu (Linux)

- CentOS 6.5 / 6.6 modify the default SSH port number (Linux)

- Hadoop new and old version of the difference in the size of the InputSplit (Server)

- GCC and gfortran write MEX program (Matlab2012a) under Ubuntu 14.04 (Programming)

- Ubuntu 14.04 install Nmap 6.46.1 (Linux)

- To change CentOS7 runlevel (Linux)

- CentOS / Debian configuration Gitlab 7.1x to build self Git repository (Linux)

- Java-- get the reflection object information (Programming)

- Java rewrite the hashcode method (Programming)

- Build ftp server under CentOS 6.5 (Server)

- A command to install Sublime Text 3 on Manjaro / Archlinux (Linux)

- lolcat: an output terminal rainbow effects in the Linux command-line tool (Linux)

- CentOS ClamAV antivirus package updates (Linux)

- VMware ghost Linux card error (Linux)

  Machine Learning: Classification of the curse of dimensionality
  Add Date : 2018-11-21      
  When looking at the paper machine learning, often see the author mentions "curse of dimensionality", Chinese translated as "curse of dimensionality", which in the end is what kind of "disaster"? This article will introduce this annoying "curse of dimensionality" and its importance in the classification problem through an example.

Suppose now that there is a group of photos, each photo has a cat or a dog. We wanted to design a classifier can automatically identify the animal in the picture off. To achieve this goal, we first need to consider how the feature photos of the animals expressed in digital form. What is the biggest difference between a cat and a dog is? One might first think of cats and dogs is not the same color, while others may think of cats and dogs sample sizes. Suppose from color to distinguish cat and dog, you can design three characteristics: the average of the red, green and blue mean averages to determine which pictures belong to a class of animals:

1 if 0.5 * red + 0.3 * green + 0.2 * blue> 0.6:
2 return cat
3 else:
4 return dog
However, only these three traits, classification may not be a satisfactory result. Thus, it can add some features: size, texture. Perhaps after adding features, classification results will be increased. However, the feature is not better?
To see the performance of the classifier with the increasing number of variation, over a certain value, performance falling instead of rising. This phenomenon is called "curse of dimensionality."

Continue the previous example. Suppose the number of cats and dogs on the planet is unlimited. Due to limited time and computing power, we only selected 10 photos as training samples. Our aim is based on these 10 photographs training a linear classifier, linear classifier can make the remaining cat or dog photos correctly classified. We only use a feature to identify cats and dogs start
If only one of the characteristics of the words, cats and dogs are almost uniformly distributed in this segment, it is difficult to 10 photos linear classification. So, the situation is what will happen after adding a feature

Add a feature we found that you still can not find a line to separate the cats and dogs. So, consider the need to add a feature
At this time, we finally found a plane separate from cats and dogs. Note that only one feature, the feature space is assumed as a segment length of 5, the sample density is 10/2 = 5. There are two features, the feature size is the 5 * 5 = 25, sample density is 10/25 = 0.4. There are three characteristics, the feature size is the 5 * 5 * 5 = 125, sample density is 10/125 = 0.08. If we continue to increase the number of features, sample density will be more sparse, it is easier to find a hyper-plane training sample separately. Because as the number of features tends to infinity, the sample density is very sparse, the training sample is the possibility of misclassification tends to zero. When we classify the results of the high-dimensional space is mapped to the low-dimensional space, a serious problem arises
See the three-dimensional feature space mapped to the results of a two-dimensional feature space after. Although linear separable training samples in high dimensional feature space, but after mapped into low-dimensional space, the result is just the opposite. In fact, increasing the number of features such high-dimensional space linearly separable, the equivalent of a low-dimensional space in a training complex nonlinear classifier. However, this non-linear classifier too "smart" just learned a few exceptions. If it is used to identify those who never appear in the training sample test sample, the results are usually less than ideal. In fact, this is what we learn in high school had a machine overfitting.
Using only two characteristics of linear classifier misclassification some training samples, high accuracy rate does not seem to FIG. 4, however, the use of two characteristic linear classifier generalization than using three characteristics of linear classifier stronger. Because the use of two characteristic linear classifier learning to not just a special case, but an overall trend for those samples has not been seen to be better distinguish open. In other words, by reducing the number of features, you can avoid overfitting, thus avoiding the "curse of dimensionality."
Another point of interpretation of the "curse of dimensionality." Assuming that only one feature, the feature of the range is 0-1, only cats and dogs each characteristic values ​​are unique. If we want the training sample covering 20% ​​of the value range of features, you will need 20% of the total number of cats and dogs. We've added a feature after feature in order to continue covering 20% ​​of the value range would require 45% of the total number of cats and dogs (0.45 ^ 2 = 0.2). Continue to add a feature, it needs 58% of the total number of cats and dogs (0.58 ^ 3 = 0.2). With the increasing number of features, in order to cover 20% of the characteristic value range, we need more training samples. If there is not enough training samples may occur overfitting.

Through the above examples, we can see the number of features more training samples will be more sparse, classifier parameter estimates will be less accurate, more prone to overfitting. Another effect of "curse of dimensionality" is the sparsity of training samples are not evenly distributed. Training at the center of the sample is more sparse than training samples around.
Suppose there is a two-dimensional feature space rectangle as shown in FIG. 8, inside the rectangle has an inscribed circle. Because the closer the center of the sample is sparse, therefore, compared to the sample within the circle, those samples located in the four corners of the rectangle is more difficult to classify. So, with the increasing number of features, circular area will not change? Here we assume that hypercube (hypercube) edge length d = 1, then calculate the radius of the hypersphere (hypersphere) volume of 0.5 (volume) of the formula.

As seen in the increasing number of features, super-spherical volume is gradually reduced until it goes to zero, but hypercube volume is constant. This result was unexpected, but some described the classification of the "curse of dimensionality": In the high-dimensional feature space, most of the training sample is located in the corner of the hypercube.
Under different dimensions, the distribution of samples. In 8-dimensional feature space, a total of 2 ^ 8 = 256 corners, while 98% of the sample distribution in these corners. With the increasing dimension, Equation 2 will tend to 0, where dist_max and dist_min denote sample-to-center distance between the maximum and minimum.

Thus, in the high-dimensional feature space for sample measurement distance meaningless. Since the basic classifier relies on as Euclidean distance, Manhattan distance, so the excessive number of features, classifier performance will decline.

So, how do we avoid the "curse of dimensionality"? Figure 1 shows the performance of the classifier with the increasing number of variation, over a certain value, performance falling instead of rising. Here's how much a certain value in the end what is it? Currently, there is no way to determine the classification in question is how much this threshold, depending on the number of training samples, the decision complexity and the type of classification boundaries. Theoretically, if the number of training samples infinite, then there will be no "curse of dimensionality", we can use any number of features to train a classifier. In fact, the number of training samples is limited, so you should not use too many features. In addition, those who need accurate nonlinear decision boundary classifiers, such as neural network, knn, decision trees and other generalization often not very good, more prone to overfitting. Therefore, in designing these classifiers should be carefully considered when the number of features. In contrast, those better generalization ability classifier, such as naive Bayesian, linear classifier, etc., may be appropriate to increase the number of features.

Given the characteristics of the N, how do we choose from the best features of the M? The most simple and crude way is to try a combination of all the features, the M pick out the best features. In fact, it is very time-consuming, or impractical. In fact, there have been many feature selection algorithm (feature selection algorithms) to help us determine the number of features and feature selection. In addition, there are many feature extraction methods (feature extraction methods), such as the PCA and the like. Cross-validation (cross-validation) it is also often used to detect and avoid over-fitting problems.
- 10 Codes of good practice PHP (Programming)
- Linux 10 useful examples of command-line completion (Linux)
- Debugging with GDB tool Go (Programming)
- DB2 commonly used scripting sort out (Database)
- How to network to share files between Windows, MAC and Linux (Linux)
- Using Ruby to build a simple HTTP service and sass environment (Server)
- Summary Linux bond of multi-interface load balancing (Linux)
- Ubuntu install image browser and manager Phototonic 1.6.17 (Linux)
- Linux uses shared memory communication process synchronization Withdrawal (Programming)
- Linux system security check notes on performance (Linux)
- Hands to teach you to solve Ubuntu error message (Linux)
- Redis master-slave replication switch (Database)
- Construction Spark source and application development environment (Server)
- VirtualBox virtual machine can not start to solve under Ubuntu (Linux)
- Oracle multi-user concurrency and transaction processing (Database)
- Oracle Listener can not start (TNS-12555, TNS-12560, TNS-00525) (Database)
- To teach you a trick to find the real IP address (Linux)
- PHP parsing algorithm of the interview questions (Programming)
- CentOS 6.5 install VNC-Server (Linux)
- Compare Several MySQL environmental issues (Database)
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.