What is machine learning?

What is machine learning?
Machine learning the word is puzzling, first of all it is the English name Machine Learning (referred to as ML) literal translation, in the computing industry Machine generally refers to the computer. The name used the anthropomorphic approach, indicating that this technology is to let the machine "learn" technology. But the computer is dead, how could the same as human "learning" it?

Traditionally, if we want the computer to work, we give it a bunch of instructions, and then it follows this instruction step by step. There is fruit, very clear. But this way does not work in machine learning. Machine learning does not accept the instructions you enter at all, and instead accepts the data you enter! That is, machine learning is a way of letting the computer use data rather than instructions to do all kinds of work. It sounds very incredible, but the result is very feasible. The idea of "statistics" will be accompanied by the idea of "learning the machine" at all times, and the concept of relevance rather than causality will be the core concept of supporting machine learning. You will subvert the fundamental idea of the causal establishment of all your previous programs.

Below I through a story to simply explain what is machine learning. This story is more suitable for use as a concept of clarification. Here, this story does not start, but the relevant content and the core is there. If you want to simply understand what is the machine learning, then read this story is enough. If you want to learn more about the machine learning and the close-up of the contemporary technology, then you continue to look down, behind the more rich content.

This example comes from my real life experience, and when I think about it, I suddenly discover that the process can be expanded into a complete machine learning process, so I decided to use this example as the beginning of all the introduction. This story is called "et al."

I believe that everyone has met with others, and then others experience. In reality, not everyone is so punctual, so when you meet some people who love late, your time is inevitable to waste. I have come across such an example.

For one of my friends, he is not so punctual, the most common manifestation is that he is often late. When I had an appointment with him at 3 o'clock in a McDonald's meeting, in the moment I went out I suddenly thought of a question: I am now right? I will not go to the location, spend 30 minutes to wait for him? I decided to take a strategy to solve the problem.

There are several ways to solve this problem. The first method is to use knowledge: I search for knowledge that can solve this problem. But unfortunately, no one will be how to wait for this issue as a knowledge to teach, so I can not find the existing knowledge to solve this problem. The second method is to ask others: I go to ask others to get the ability to solve this problem. But the same, no one can answer this question, because no one may run into the same situation as me. The third method is the standard method: I asked myself, did I set up any criteria to face this problem? For example, no matter how others, I will arrive at time. But I am not a rigid person, I have not set up such a rule.

In fact, I believe that there are ways to be more appropriate than the above three. I put the experience with the small Y in the mind to reproduce what to see the number of times with him, the late accounted for how much the proportion of late. And I use this to predict the possibility of his late arrival. If this value is beyond the limits of my heart, then I choose to wait for a while again. Assuming that I had about five times with Little Y, the number of times he was late was 1, then he was 80% on a timely basis and 70% of my mind. I think this little Y should not be late. Go out. If the small Y in the 5 times the number of times accounted for 4 times, that is, he arrived on time the proportion of 20%, because this value is lower than my threshold, so I chose to postpone the time to go out. This method from its use level, also known as the empirical method. In the course of thinking about the experience, I actually used all the previous data. It can also be called based on the data to do the judge.

Judgments based on data are fundamentally consistent with the idea of machine learning.

Just thinking about the process I only consider the "frequency" of this property. In the real machine learning, this may not be an application. The general machine learning model considers at least two quantities: one is the dependent variable, that is, the result we want to predict, and in this case the judgment of the late Y is late or not. The other is the independent variable, that is used to predict whether the amount of small Y is late. Suppose I take time as an independent variable, for example, I find that all of Y's late days are basically Friday, and in the case of non-Friday he is basically not late. So I can build a model to simulate the probability that the small Y is late or not whether it is Friday or not.

When we consider only one argument, the situation is relatively simple. If we add one of our independent variables. For example, when the small part of Y is late when he drove over (you can understand that he drove the level of smelly, or more blocked). So I can think about this information. To create a more complex model, the model contains two independent variables and a dependent variable.

And then more complicated, little Y late with the weather there are certain reasons, such as when the rain, this time I need to consider three independent variables.

If I wanted to be able to predict the specific time of the small Y, I could set up a model for each time he was late with the size of the rainfall and the independent variables considered earlier. So my model can predict the value, for example, he will probably be late for a few minutes. This will help me better plan my time out. In such cases, the decision tree can not be well supported because the decision tree can only predict discrete values. We can build this model using the linear regression method described in Section 2.

If i put the process of building the model to the computer. Such as the input of all the independent variables and dependent variables, and then let the computer help me to generate a model, and let the computer according to my current situation, given whether I need to go out late, need a few minutes later recommendations. Then the process of computer implementation of these auxiliary decision-making process is the machine learning process.

The machine learning method is a method by which the computer uses the existing data (experience) to arrive at a model (late law) and use this model to predict the future (whether it is late).

Through the above analysis, we can see that machine learning and human thinking process is similar, but it can consider more of the situation, the implementation of more complex calculations. In fact, one of the main purposes of machine learning is to transform the process of human experience into the process of calculating the model by calculating the data. The computer-derived model can solve many flexible and complex problems in a similar way to people.

Below, I will start a formal introduction to machine learning, including definitions, scope, methods, applications, etc., are included.

2. Definition of machine learning

In a broad sense, machine learning is a way to give the machine the ability to learn so that it can complete the functions that can not be done directly. But in the sense of practice, machine learning is a way of using model to predict the model and then use the model to predict it.

Let's look at an example.

Figure 4 Example of house prices

Take the house of the national topic. Now I have a house in my hand that needs to be sold. How much should I price it? The size of the house is 100 square meters, the price is 100 million, 1.2 million, or 1.4 million?

Obviously, I want to get a certain pattern of house prices and area. So how do I get this rule? Is the average price of the house with the newspaper? Or refer to someone else's area similar? No matter what kind of, it seems that is not too Kaopu.

I now want to get a reasonable, and can maximize the relationship between the area and the relationship between housing prices. So I surveyed some houses that were similar to my room and got a set of data. This group of data contains a large and small house area and price, if I can from this set of data to find the area and the price of the law, then I can get the price of the house.

The search for the law is very simple, fit out a straight line, let it "through" all the points, and the distance with each point as small as possible.

Through this line, I got a law that best reflects the law of house prices and area. This line is also a function of the following formula:

　　House price = area * a + b

The above a, b are straight parameters. After getting these parameters, I can calculate the price of the house.

Assuming a = 0.75, b = 50, then the price = 100 * 0.75 + 50 = 1.25 million. This result is the same as I listed above 1 million, 1.2 million, 1.4 million are not the same. Since this straight line takes into account most of the situation, it is one of the most reasonable predictions in terms of "statistics".

In the process of solving the two messages revealed:
1. The price model is based on the type of function to determine the fitting. If it is a straight line, then the fit is a straight line equation. If it is other types of lines, such as parabola, then the fit is the parabolic equation. There are a number of algorithms for machine learning, and some powerful algorithms can fit out of complex nonlinear models to reflect situations that are not straight lines.
2. If my data is more, the more I can take into account the more the situation, the better the forecast for the new situation may be better. This is the machine learning community "data is king" a reflection of the idea. In general (not absolute), the more data, the last machine learning to generate the model to predict the better results.

Through the process of fitting my straight line, we can make a complete review of the machine learning process. First, we need to store historical data in the computer. Then, we process the data through the machine learning algorithm. This process is called "training" in machine learning. The results of the processing can be used to predict new data. This result is commonly referred to as "model". The prediction of new data is called "prediction" in machine learning. "Training" and "prediction" are the two processes of machine learning. The "model" is the intermediate output of the process. "Training" produces "model" and "model" to guide "prediction".

Let us compare the process of machine learning with the process of human history.

Human beings in the growth and living process has accumulated a lot of history and experience. Human beings regularly "summarize" these experiences and gain the "rule" of life. When humans encounter unknown problems or need to "speculate" the future, humans use these "laws", the unknown and the future of "speculation" to guide their own lives and work.

The "training" and "forecasting" processes in machine learning can correspond to human "induction" and "speculation" processes. Through this correspondence, we can find that the idea of machine learning is not complicated, just a simulation of the growth of human beings in life. Since machine learning is not based on the result of programming, its processing is not causal logic, but rather through the conclusion of the relevance of the conclusions drawn.

This can also think of why human beings learn history, history is actually a summary of past experience of mankind. There is a saying that "history is often not the same, but history is always strikingly similar." Through the study of history, we summed up from the history of life and the law of the country, so as to guide our next step, which is of great value. Some of the contemporary people ignore the historical value of the original, but rather as a means of promoting merit, which is actually a true value of the history of misuse.

3. The scope of machine learning

Although the above description of the machine learning is what, but did not give the scope of machine learning.

In fact, machine learning with the pattern recognition, statistical learning, data mining, computer vision, speech recognition, natural language processing and other fields have a deep connection.

In terms of scope, machine learning is similar to pattern recognition, statistical learning, and data mining. At the same time, the combination of machine learning and other fields of processing technology forms the interdisciplinary disciplines such as computer vision, speech recognition and natural language processing. Therefore, the general data mining, it can be equivalent to that machine learning. At the same time, we usually say that the machine learning application, should be generic, not only limited to structured data, as well as images, audio and other applications.

In this section of the machine to learn these related areas of the introduction will help us to understand the machine learning application scenarios and research scope, a better understanding of the subsequent algorithm and application level.

The following figure is the study of the machine involved in a number of related disciplines and research areas.

Pattern

Recognition Pattern Recognition = Machine Learning. The main difference between the two is that the former is developed from the concept of industry, which is mainly from the computer science. In the famous "Pattern Recognition And Machine Learning" book, Christopher M. Bishop said at the outset that "pattern recognition comes from industry and machine learning comes from computer science, but their activities can be viewed For the two areas of the same field, while in the past 10 years, they have made great progress. "

Data Mining Data Mining = Machine Learning + Database. The concept of data mining in recent years is too familiar. Almost equivalent to speculation. Whenever data mining will brag about how data mining, such as digging gold from data, and converting obsolete data into value and so on. But, although I may dig out the gold, but I may also dig the "stone" ah. The idea is that data mining is just a way of thinking, telling us that we should try to dig out the knowledge from the data, but not every data can dig out the gold, so do not myth it. A system will never be on the basis of a data mining module becomes omnipotent (which is IBM most like to boast), on the contrary, a person with data mining thinking is the key, and he must also have a deep data Of the knowledge, so that it may be derived from the data model to guide the improvement of business. Most of the algorithms in data mining are machine learning algorithms that are optimized in the database.

Statistical learning statistics is approximately equal to machine learning. Statistical learning is a discipline that overlaps with machine learning. Because most of the methods of machine learning come from statistics, it can even be thought that the development of statistics promotes the prosperity of machine learning. For example, the famous support vector machine algorithm, that is derived from the statistical disciplines. But in a way the two are different, this is: statistical learners focus on the development of statistical models and optimization, partial mathematics, and machine learners are more concerned about is able to solve the problem, partial practice, so Machine learning researchers will focus on learning algorithms on the computer to implement the efficiency and accuracy of the upgrade.

Computer Vision Computer Vision = Image Processing + Machine Learning. The image processing technique is used to process the image into input suitable for entry into the machine learning model. Machine learning is responsible for identifying the relevant patterns from the image. Computer vision-related applications are very much, such as Baidu map, handwriting character recognition, license plate recognition and so on. This area is very hot application prospects, but also the popular direction of the study. With the development of the new field of machine learning, the effect of computer image recognition has been greatly promoted, so the future development trend of computer vision is immeasurable.

Speech Recognition Speech Recognition = Speech Processing + Machine Learning. Speech recognition is the combination of audio processing technology and machine learning. Speech recognition technology is generally not used alone, generally combined with natural language processing technology. The current related applications are Apple's voice assistant siri and so on.

Natural language processing natural language processing = text processing + machine learning. Natural language processing technology is mainly to let the machine understand the human language of a field. In the natural language processing technology, a lot of use of the compiler theory related to the technology, such as lexical analysis, grammar analysis, etc. In addition, in understanding this level, the use of semantic understanding, machine learning and other technologies. As the only symbol created by human beings, natural language processing has always been the direction of continuous study of machine learning. According to Baidu machine learning expert Yu Kai's argument "listen and see, that white is the cat and the dog will be, and only the language is unique to humans." How to use machine learning technology for the deep understanding of natural language has always been the focus of industry and academia.

Can be seen in machine learning in many areas of extension and application. The development of machine learning technology has led to a lot of progress in the field of intelligence to improve our lives.

4. Machine learning method

Through the introduction of the previous section we know the approximate range of machine learning, then the machine inside the study of how many classic algorithm? In this section I will briefly introduce the classic method of machine learning. This section focuses on the ideas of these methods, and the details of mathematics and practice will not be discussed here.

1, regression algorithm

In most of the machine learning courses, the regression algorithm is introduced in the first algorithm. There are two reasons: 1. The regression algorithm is relatively simple, it can make people smooth from the statistical migration to machine learning. 2. The regression algorithm is the cornerstone of several powerful algorithms behind, and if you do not understand the regression algorithm, you can not learn those powerful algorithms. Regression algorithm has two important subclasses: linear regression and logistic regression.

Linear regression is what we said earlier to solve the problem of housing prices. How do I fit out a straight line to match all my data best? Generally use the "least squares method" to solve. The idea of "least squares" is that, assuming that the line we fit out represents the true value of the data, and the observed data represents the value of the error. In order to minimize the effect of the error, it is necessary to solve a straight line so that the sum of squares of all errors is minimized. Least square method transforms the optimal problem into the extremum of function. Extreme value of the function in mathematics, we generally use the derivative method for the 0. But this approach is not suitable for computers, may not be solved, it may be too much calculation.

The computer science community has a discipline called "numerical calculation", specifically used to enhance the computer to calculate the accuracy and efficiency of various types of problems. For example, the famous "gradient drop" and "Newton method" is the classical algorithm in numerical calculation, and is also very suitable for solving the problem of solving the extreme value of the function. The gradient descent method is one of the simplest and most effective methods to solve the regression model. In the strict sense, the gradient descent method is also applied in the later algorithm implementation because of the linear regression factor in the neural network and the recommended algorithm.

Logical regression is a very similar algorithm to linear regression, but, in essence, the type of linear regression processing is not consistent with logical regression. Linear regression deals with numerical problems, that is, the final result is a number, such as house prices. And logical regression belongs to the classification algorithm, that is, the logical regression prediction results are discrete classification, such as judging whether the message is spam, and whether the user will click on this ad and so on.

In terms of implementation, the logical regression simply adds a Sigmoid function to the result of the linear regression, transforming the numerical result into a probability between 0 and 1 (the image of the Sigmoid function is generally not intuitive and you only need to understand The greater the value, the closer the function is, the smaller the value, the closer the function is, and then we can make predictions based on this probability, for example, if the probability is greater than 0.5, the message is spam, or whether the tumor is malignant or not. Intuitively speaking, the logical regression is to draw a classification line,

Suppose we have a group of patients with tumor data, some of these patients with benign tumors (blue spots in the figure), some are malignant (red dot in the figure). Here the red and blue of the tumor can be called the "tag" of the data. While each data includes two "features": the patient's age with the size of the tumor. We will map these two features and labels to this two-dimensional space, the formation of the data on my map.

When i have a green spot, should I judge if the tumor is malignant or benign? According to the red and blue we trained a logical regression model, that is, the classification line in the figure. At this point, according to the green dot on the left side of the classification line, so we judge its label should be red, that is part of the malignant tumor.

Logistic regression algorithms are basically linear (and have a logical regression of the non-linear classification lines, but the model will be very inefficient when dealing with large amounts of data), which means that when two When the boundary is not linear, the logical regression of the expression of capacity is insufficient. The following two algorithms are the most powerful and important algorithms in machine learning, and can be fitted with non-linear classification lines.

2, neural network

Neural network (also known as artificial neural network, ANN) algorithm is 80 years machine learning industry is very popular algorithm, but in the mid-90s decline. Now, carrying the "depth of learning" trend, neural network reloading back, to become one of the most powerful machine learning algorithm.

The birth of neural networks originated from the study of the working mechanism of the brain. Early biologists use neural networks to simulate the brain. Machine learning scholars use neural networks for machine learning experiments, found in the visual and voice recognition on the effect is quite good. After the birth of the BP algorithm (numerical algorithm to accelerate the neural network training process), the development of neural networks has entered a boom. One of the inventors of the BP algorithm is the machine learning Daniel Geoffrey Hinton (the middle of Figure 1).

Specifically, what is the learning mechanism of neural networks? In simple terms, it is decomposition and integration. In the famous Hubel-Wiesel experiment, scholars have studied the visual analysis mechanism of the cat.

Figure 8 Hubel-Wiesel test and brain vision mechanism

For example, a square, broken down into four polylines into the next layer of visual processing. The four neurons deal with a broken line separately. Each line of poles continues to be broken down into two straight lines, each of which is then broken down into black and white faces. Thus, a complex image becomes a lot of details into the neurons, neurons processed later and then integrated, and finally came to see the square is the conclusion. This is the mechanism of brain visual recognition, but also the mechanism of neural network work.

Let's look at the logical architecture of a simple neural network. In this network, divided into input layer, hidden layer, and output layer. The input layer is responsible for receiving the signal, the hidden layer is responsible for the decomposition and processing of the data, and the final result is integrated into the output layer. Each circle in a circle represents a processing unit, which can be thought of as a simulation of a neuron. A number of processing units form a layer, and several layers form a network, the "neural network".

Figure 9 Logical architecture of neural networks

In the neural network, each processing unit is in fact a logical regression model, the logical regression model to receive the upper input, the model of the forecast results as output to the next level. Through such a process, the neural network can accomplish very complex nonlinear classification.

The following figure shows a well-known application of neural networks in the field of image recognition. This program is called LeNet and is a neural network based on multiple hidden layers. LeNet can be used to identify a variety of handwritten numbers, and to achieve a high recognition accuracy and have a good robustness.

Figure 10 LeNet effect display

The lower right of the square shows the input computer image, square above the red words "answer" is displayed behind the computer's output. The three vertical image columns on the left show the output of the three hidden layers in the neural network. It can be seen that the deeper the depth of the hierarchy, as the hierarchy progresses, for example, Is the details of the line. The inventor of LeNet is the Yann LeCun (pictured right) of the machine learning machine that we introduced earlier.

Into the 90's, the development of neural networks into a bottleneck period. The main reason is that despite the acceleration of the BP algorithm, the neural network training process is still very difficult. So the late 90s support vector machine (SVM) algorithm to replace the status of neural networks.

3, SVM (support vector machine)

Support vector machine algorithm is born in the statistical learning community, at the same time in the machine learning industry shine classic algorithm.

Support vector machine algorithm is a kind of reinforcement of logic regression algorithm in some sense. By giving the logic regression algorithm more stringent optimization conditions, support vector machine algorithm can get better classification than logical regression. But if there is no function of a class of technology, support vector machine algorithm is considered a better linear classification technology.

However, through the combination with the Gaussian "kernel", the support vector machine can express a very complex classification boundary, so as to achieve a good classification effect. "Nuclear" is in fact a special function, the most typical feature is the low-dimensional space can be mapped to high-dimensional space.

For example, the following figure:

Figure 11 Support vector machine legend

How do we divide a circle in a two-dimensional plane? In the two-dimensional plane may be very difficult, but through the "core" can be two-dimensional space mapping to three-dimensional space, and then use a linear plane can achieve similar results. That is, the non-linear classification boundary of the two-dimensional plane can be equivalent to the linear classification boundary of the three-dimensional plane. Thus, we can achieve a nonlinear partitioning effect in a two-dimensional plane by performing a simple linear partitioning in the three-dimensional space.

Figure 12 cutting of three-dimensional space

Support vector machine is a very mature machine learning algorithm (relative to the neural network is a biological science component). In the core step of the algorithm, there is a step to prove that the data from low-dimensional mapping to high-dimensional will not bring the final calculation of the complexity of the upgrade. Thus, through the support vector machine algorithm, both to maintain the efficiency of the calculation, but also can get a very good classification effect. Therefore, support vector machine in the late 90s has been occupied by the core of the machine learning status, the basic replacement of the neural network algorithm. Until now the neural network through the depth of learning to re-rise, between the two before the subtle balance of change occurred.

4, clustering algorithm

One of the salient features of the previous algorithm is that my training data contains labels that can be used to predict labels for other unknown data. In the following algorithm, the training data are not tagged, and the purpose of the algorithm is through training, to infer the data of the label. This algorithm has a general term, that is, unsupervised algorithm (the previously marked data algorithm is a supervised algorithm). The most typical representation of unsupervised algorithms is clustering.

Let us still take a two-dimensional data, a data contains two features. I hope that through the clustering algorithm, give them different types of labels, how can I do? In short, the clustering algorithm is to calculate the distance in the population, according to the distance of the data is divided into multiple groups.

The most typical representation of a clustering algorithm is the K-Means algorithm.

5, down dimension algorithm

The reduced dimension algorithm is also an unsupervised learning algorithm, whose main feature is to reduce the data from high dimension to low dimension. Here, the dimension actually represents the size of the feature of the data. For example, the house price contains the four characteristics of the house's length, width, area and number of rooms, that is, the dimension is 4-dimensional data. It can be seen that the length and width are in fact overlapped with the information represented by the area, for example, area = length and width. With the reduced dimension algorithm, we can remove redundant information and reduce the feature to two features: the area and the number of rooms, from 4-dimensional data to 2-dimensional. So we will reduce the data from high-dimensional to low-dimensional, not only conducive to that, while the calculation can also bring acceleration.

Just to reduce the dimensionality of the dimension reduction process is visually visible, and compression does not cause loss of information (because of information redundancy). If the naked eye is not visible, or there is no redundant features, the reduced dimension algorithm can work, but this will bring some information loss. However, the dimensionality reduction algorithm can mathematically prove that the data of the data is preserved to the greatest extent from the high dimension to the low dimension. Therefore, the use of dimensionality reduction algorithm still has a lot of benefits.

The main function of the reduced dimension algorithm is to compress the data and improve the efficiency of the machine to learn other algorithms. With the reduced dimension algorithm, data with thousands of features can be compressed into several features. In addition, another benefit of the reduced dimension algorithm is the visualization of data, such as compressing 5-dimensional data into two dimensions and then using a two-dimensional plane to visualize it. The main representative of the reduced dimension algorithm is the PCA algorithm (ie, the principal component analysis algorithm).

6, recommended algorithm

Recommended algorithm is the industry is a very fire algorithm, in the electricity business community, such as Amazon, Lynx, Jingdong, etc. have been widely used. The main feature of the recommended algorithm is to automatically recommend the most interested things to the user, thereby increasing the purchase rate and improve efficiency. The recommended algorithm has two main categories:

One is based on the contents of the recommendation of the object is to buy the contents of the user with the approximate items recommended to the user, so that each item has a number of labels, so you can find out with the user to buy items similar items, The advantage of this recommendation is a greater degree of relevance, but because each item needs to be labeled, so the workload is greater.

The other is based on the user similarity of the recommendation, it is the same interest with the target user to buy other things to recommend to the target user, such as small A history to buy items B and C, after the algorithm analysis, found another and Small A similar user small D to buy the item E, so the item E recommended to the small A.

Two types of recommendations have their own advantages and disadvantages, in the general application of electricity, the general is two kinds of mixed use. The most famous algorithm in the recommended algorithm is the collaborative filtering algorithm.

7, other

In addition to the above algorithms, the machine learning community has other algorithms such as Gaussian Discrimination, Naive Bayes, Decision Trees, and so on. But the above six algorithms are the most used, the most extensive, the most complete type of typical. Machine learning industry is a feature of many algorithms, the development of flourishing.

The following is a summary, according to the training data with or without tags, the above algorithm can be divided into supervised learning algorithm and unsupervised learning algorithm, but the recommended algorithm is more special, neither supervised learning, nor is the unsupervised learning, is a separate one type.

Supervised Learning Algorithm:
Linear Regression, Logistic Regression, Neural Network, SVM

Unsupervised learning algorithm:
clustering algorithm, dimensionality reduction algorithm

Special algorithm:
Recommended algorithm

In addition to these algorithms, there are some algorithms in the field of machine learning also often appear. But they themselves are not a machine learning algorithm, but to solve a sub-problem and the birth. You can understand them for the above algorithm sub-algorithm, used to greatly improve the training process. Among them, the descending method is mainly used in linear regression, logic regression, neural network and recommendation algorithm. Newton method is mainly used in linear regression. BP algorithm is mainly used in neural network. SMO algorithm is mainly used In SVM.

5. Machine learning applications - large data

Having finished the machine learning method, the following talk about the application of machine learning. Undoubtedly, by 2010, machine learning applications played a significant role in certain areas such as license plate recognition, network attack prevention, handwriting character recognition, and so on. However, since 2010, with the rise of large data concepts, machine learning a large number of applications are highly coupled with large data, almost can be considered large data is the best scene for machine learning applications.

For example, whenever you can find the introduction of large data magic article, will say how big data accurate and accurate to predict something. For example, the classic Google using large data to predict the H1N1 in the United States a town outbreak.

Figure 13 Google successfully predicted H1N1

Baidu predicts the 2014 World Cup, from the knockout to the final all predicted correctly.

Figure 14 Baidu World Cup successfully predicted all the results of the game

These are really amazing, then what is the reason for the big data with these magic? In short, it is machine learning technology. It is based on the application of machine learning technology, the data can play its magic.

The core of large data is the use of the value of data, machine learning is the use of data value of the key technology, for large data, the machine learning is indispensable. On the contrary, for machine learning, the more data will be more likely to improve the accuracy of the model, while the complexity of the machine learning algorithm computing time is also an urgent need for distributed computing and memory computing such key technologies. Therefore, the rise of machine learning is also inseparable from the help of large data. Big data and machine learning are mutually reinforcing and dependent.

Machine learning is closely linked to large data. However, it must be soberly aware that large data is not equivalent to machine learning, empathy, machine learning is not equivalent to large data. Large data includes distributed computing, memory database, multi-dimensional analysis and so on a variety of technologies. From the analysis point of view, large data also includes the following four analysis methods:

1. Large data, small analysis: the field of data warehouse OLAP analysis ideas, that is, multi-dimensional analysis of ideas.
2. Large data, large analysis: This represents the data mining and machine learning analysis.
3. Stream analysis: This mainly refers to the event-driven architecture.
4. Query analysis: classic representative is NoSQL database.

In other words, machine learning is only one of the big data analysis. Although some of the results of machine learning have great magic, in some cases it is the best indication of the value of large data. But this does not mean that machine learning is the only method of analysis under large data.

The combination of machine learning and large data has produced great value. Based on the development of machine learning technology, data can be "predicted". For human beings, the accumulation of experience the richer, experience is also extensive, the more accurate judgment of the future. For example, often say "experienced" people than "fledgling" young man more work on the advantage that the experience of the law is more accurate than others. In the field of machine learning, according to a well-known experiment, we have effectively confirmed the theory of machine learning. The more the data of the machine learning model, the better the efficiency of machine learning. See below:

Figure 15 Machine learning accuracy and data relationship

Through this picture can be seen, a variety of different algorithms in the input data to a certain number of series, have similar high accuracy. So the birth of the machine learning industry famous: successful machine learning application is not the best algorithm, but with the most data!

In the era of large data, there are many advantages to make machine learning can be applied more widely. For example, with the development of Internet of things and mobile devices, we have more and more data, including image, text, video and other unstructured data, which makes the machine learning model can get more and more data. At the same time large data technology in the distributed computing Map-Reduce makes the machine learning faster and faster, can be more convenient to use. All the advantages in the era of large data, the advantages of machine learning can be the best play.

6. Subclass of machine learning - depth learning

Recently, the development of machine learning has produced a new direction, namely "deep learning".

Although the depth of learning the four words sounds quite tall, but the idea is very simple, that is, the traditional neural network development to the hidden layer of the situation.

As mentioned above, since the 90's, the neural network has been quiet for some time. But Geoffrey Hinton, inventor of the BP algorithm, has not given up on the study of neural networks. As the neural network in the hidden layer to expand to more than two, the training speed will be very slow, so the practicality has been lower than the support vector machine. In 2006, Geoffrey Hinton published an article in the science journal Science, demonstrating two perspectives:

　　1. Multi-hidden neural network has excellent feature learning ability, learning the characteristics of the data have a more basic characterization, which is conducive to visualization or classification;

　　2. The depth of the neural network in the training of the difficulty, through the "layer by layer initialization" to effectively overcome.

Figure 16 Geoffrey Hinton and his students publish articles on Science

Through this discovery, not only solve the neural network in the calculation of the difficulty, but also shows the deep neural network in learning the superiority. Since then, the neural network has become a mainstream learning technology in the machine learning industry. At the same time, neural networks with multiple hidden layers are called depth neural networks, and depth learning based on depth neural networks is called depth learning.

Due to the important nature of the depth of learning, in all respects have made great attention, according to the timeline sort, the following four landmark events worth mentioning:

In June 2012, the New York Times disclosed the Google Brain project, which was co-led by Andrew Ng and inventor Jeff Dean, a co-author of 16,000 CPU cores, to train a "deep neural network "Machine learning model, in the field of speech recognition and image recognition has been a great success. Andrew Ng is the beginning of the article introduced by the machine learning Daniel (left in Figure 1).

November 2012, Microsoft in Tianjin, China, an event on the public demonstration of a fully automated simultaneous interpretation system, the speaker in English speech, the background of the computer at one go automatically completed voice recognition, English translation of the machine, as well as Chinese speech synthesis, The effect is very smooth, which supports the key technology is the depth of learning;

In January 2013, at the annual meeting of Baidu, founder and CEO Li Yanhong announced the establishment of Baidu Institute of high-profile, the first focus is the depth of learning, and the establishment of the Institute of Depth Learning (IDL).

The beginning of the article listed in the three machines to learn the cattle, not only are experts in the machine learning industry, but also a pioneer in the field of deep study and research. Therefore, the reason why they are at the helm of the various large Internet companies is not only because of their technical strength, but also in the field of their research is the prospect of unlimited depth of learning technology.

At present, many of the industry's image recognition technology and voice recognition technology progress from the depth of the development of learning, in addition to the beginning of this article mentioned Cortana and other voice aids, but also includes some image recognition applications, the typical representative is the chart below the Baidu map Features.

Depth learning is a subclass of machine learning. The development of depth learning based on the great promotion of the status of machine learning to improve, further, to promote the industry to learn the father of artificial intelligence dream once again attention.

7. Machine learning the parent class - artificial intelligence

Artificial intelligence is the parent class of machine learning. Depth learning is a subclass of machine learning. If the relationship between the three used to show the words, then the following figure:

There is no doubt that artificial intelligence (AI) is the human can imagine the most groundbreaking invention of science and technology, in a sense, artificial intelligence, like the final fantasy name of the game, is the ultimate dream of the human world for science and technology. Since the concept of artificial intelligence from the 1950s, the scientific and technological circles, the industry continue to explore, research. This time all kinds of novels, the film is in a variety of ways to show the imagination for artificial intelligence. Human beings can invent a machine similar to the human, which is how a great idea! But in fact, since the 50's, the development of artificial intelligence on the bumps, did not see enough shock to the progress of science and technology.

To sum up, the development of artificial intelligence has gone through several stages, from early logical reasoning to medium-term expert systems. These advances in research really make us a little closer to the intelligence of the machine, but there is still a long distance. Until the machine was born after the birth of the artificial intelligence sector finally find the right direction. Image recognition and speech recognition based on machine learning have reached a level comparable to that in some vertical areas. Machine learning makes mankind the first time so close to the dream of artificial intelligence.

In fact, if we make artificial intelligence-related technology and other industry technology to do an analogy, you can find the machine learning in the artificial intelligence of the important position is not without reason.

Human distinction is different from other objects, plants, animals, the most important difference, the author that is "wisdom . " What is the best embodiment of wisdom?

Is the ability to calculate it, should not be, mental arithmetic speed of the people we generally call genius
Is the ability to react, nor is it, the quick response we call it sensitive.
Is the memory ability, nor is the memory of good people we generally call it never forget.
Is it the reasoning ability, such a person I may call him intellect very high, similar to "Sherlock Holmes" but will not call him wisdom.
Is the ability of knowledge Mody, such a person we call Bo Wen wide, it will not call him wisdom.

Think about what we generally describe who has great wisdom? Saints, such as Zhuangzi, I, and so on. Wisdom is the perception of life, is the accumulation of life and thinking , which is our machine learning thinking how similar? Through experience to get the law, to guide life and the future. There is no wisdom without experience.

So, from the computer point of view, all the above capabilities have all kinds of technology to deal with.

Such as computing power We have distributed computing, responsiveness We have event-driven architecture, search capabilities We have search engines, knowledge storage capabilities We have data warehouses, logical reasoning capabilities We have expert systems, but only the most significant features of the corresponding wisdom Of the induction and sentiment, only the machine learning with the corresponding. This is also the machine learning ability to express the root cause of wisdom.

Let us look at the manufacture of robots, and if we have a strong calculation, massive storage, quick retrieval, rapid response, excellent logical reasoning, if we then cooperate with a powerful wisdom brain, a real sense Artificial intelligence may be born, which is why the rapid development of learning in the machine now, artificial intelligence may no longer be the reason for the dream.

The development of artificial intelligence may not only depend on the machine learning, but also depends on the depth of learning previously described, the depth of learning technology due to the depth of the human brain to simulate the composition of the visual recognition and speech recognition on the significant breakthrough in the original machine learning technology Of the boundaries, so it is very likely to truly realize the key technology of artificial intelligence dream. Whether it is Google brain or Baidu brain, are through the massive level of depth learning network formed. Perhaps by means of deep learning techniques, in the near future, a computer with a human intelligence is really possible to achieve.

Finally, talk about digression, because artificial intelligence with the help of the rapid development of advanced technology, has been in some places caused by the traditional technology sector concerns. Real world "Iron Man", Tesla CEO Maske is one of them. Mascher recently expressed his concern about artificial intelligence when he attended the MIT seminar. "Artificial intelligence research is similar to calling demons, and we have to pay attention in some places."

Figure 21 Maske and Artificial Intelligence

Although Masik's worries were alarmist, the Maske's reasoning was not unreasonable. "If artificial intelligence wants to eliminate spam, it may be the last decision to eliminate mankind." Maske argues that the way to prevent such phenomena is to introduce government regulation. Where the author's point of view is similar to that of Muske, where it is possible to add a few rules to the beginning of the birth of artificial intelligence, that is, it should not use a simple machine learning, but rather a combination of machine learning and rule engine Better solve such problems. Because if there is no limit to learning, most likely to enter a misunderstanding, must be added some guidance. As in human society, law is the best rule, and the murderer's death is an insurmountable limit for mankind to explore productivity.

Here, we must mention that the rules here are different from the laws of machine learning. The law is not a strict criterion. It represents more probable guidance, and the rule is sacred and inviolable. The law can be adjusted, but the rules can not be changed. Effective combination of rules and rules of the characteristics, can lead to a reasonable, controllable learning artificial intelligence.

8. Machine learning thinking - the subconscious of the computer

Finally, the author would like to talk about some of the thinking about machine learning. Mainly in the daily life of the author summed up some of the sentiment.

Recall the story I said in section 1, and I made a list of the experiences I had made with me. But there are only a few people who would have done so, and most of them used a more direct approach, that is, using intuition. So, what is intuition? In fact, intuition is also in your subconscious state of thinking after the law of the law. Just as you learn through the machine algorithm, get a model, then you only need to use the next line on the line. So when did you think about this rule? May be in your unconscious circumstances, such as sleeping, walking and so on. At this time, the brain is also quietly doing some work you can not notice.

This intuition and subconsciousness, I make it different from another way of thinking about human beings. If a person is diligent in thinking, for example, he will make a summary every day, such as "my day and three provinces," or he often talk with his companion about the recent work of the gains and losses, then his training model is direct, conscious Thinking and induction. This effect is very good, memory is strong, and more effective response to the reality of the law. But most people may rarely do such a summary, then they come to the law of life is the use of the subconscious method.

Give an author's own example of subconscious. The author himself did not open the car, recently bought a car, driving every day to work every day. I take a fixed route every day. Interestingly, at the beginning of the day, I was very nervous about the front of the road, and now I have unconsciously put the car to the target. This time my eyes are watching the front, my brain is not thinking, but my hand steering wheel will automatically adjust the direction. That is to say. With the increase in the number of my drive, I have to drive my action to the subconscious. This is a very interesting thing. In this process, my brain will be in front of the road image recorded down, while the brain also remember me turn the steering wheel action. Through the brain's own subconscious thinking, the final generation of the subconscious can be directly based on the front of the image adjustment my hand action. Suppose we give the video to the front of the computer, and then let the computer record the driver's action corresponding to the image. After a period of study, the computer-generated machine learning model can be autopilot. This is amazing, is not it? In fact, including Google, Tesla, including the principle of automatic driving car technology is the case.

In addition to driving a car, the subconscious mind can also be extended to human communication. For example, to persuade others, one of the best ways is to show him some information, and then let him to sum up the conclusions we want. This is like in Elaborating on a point of view, with a fact or a story, much better than the large segment of the truth. Throughout the ages, whenever the excellent lobbyists, all use this method. During the Spring and Autumn and the Warring States period, countries are all vertical and horizontal, often with a variety of lobbyists to communicate with a king of the country, directly tell the monarch what to do, is tantamount to suicidal, but with the monarch story, through the story so that the monarch suddenly realized, Is a correct process. There are many outstanding representatives, such as Mozi, Su Qin and so on.

Basically all the communication process, the use of the story that the effect is far better than the moral effect of a lot better. Why the story of the method than the truth or other methods much better, it is because in the process of human growth, through their own thinking, has formed a lot of law and subconscious. If you tell the law and the other does not match, it is possible for protection, they will instinctively refuse your new law, but if you tell him a story, pass some information, send some data to him, he will think And self-change. His thinking process is actually the machine learning process, he put the new data into his old memories and data, after re-training. If you give the amount of data is very large, large to adjust his model, then he will follow the law you want to do things. Sometimes, he instinctively refuses to carry out this thinking process, but once the data is entered, whether or not he wants or not, his brain will think in the subconscious state and may change his mind.

What happens if the computer also has a subconscious (as the name of this blog) For example, let the computer in the process of work, gradually produced their own subconscious, so you can even tell it when it does not matter when it will complete that thing. This is a very interesting idea, here to give you readers to divergent thinking it.

9. Summary

This paper first introduces the trend of the Internet and the machine learning Daniel, and the use of machine learning related applications, and then to a "story of others" to start the introduction of machine learning. The first is the concept and definition of machine learning, then the related disciplines of machine learning, the learning algorithms included in machine learning, and then introduces the relationship between machine learning and large data, the new subclass of machine learning, and finally discusses A bit of machine learning and the development of artificial intelligence and machine learning and subconscious association. After the introduction of this article, I believe we have a certain understanding of machine learning technology, such as what is the machine learning, what is its core idea (ie, statistics and induction), by understanding the machine learning and human thinking of the approximate contact can know why the machine learning The reason for the wisdom and so on. Secondly, this article talked about the relationship between machine learning and extension discipline, the relationship between machine learning and large data mutual promotion, the rapid development of the latest learning in machine learning, and the development and thinking of human intelligent robot based on machine learning , And finally the author simply talked a little about letting the computer have a subconscious idea.

Machine learning is the industry's most Amazing and fiery technology, every time from the Internet to buy Taobao things, to the automatic driving car technology, as well as network attack defense system, etc., have machine learning factors, including machine learning It is also the most likely to make AI a dream of the completion of AI dream, a variety of artificial intelligence applications such as Microsoft's small ice chat robots, to computer vision technology, have the elements of machine learning effort. As a contemporary computer field development or management personnel, as well as people living in this world, users of IT technology to bring convenience, it is best to understand some of the machine learning related knowledge and concepts, because it can help you better The understanding that brings you great convenience technology behind the principle, and let you better understand the process of contemporary technology.

10. Postscript

This document took the author for two months and was finally completed the day before the last day of 2014. Through this article, the author hopes to contribute to the popularization of machine learning in the country, but also the author of their own learning for the machine learning knowledge of a coherent, the overall induction of the improvement process. The author of so much knowledge through their own brain thinking, training out of a model, the formation of this document, it can be said that this is a machine learning process (laughs).

The author of the industry will be exposed to a lot of data, so the data processing and analysis is usually very important work, machine learning course ideas and ideas for the daily work of the author a great role, almost led to the author for the value of the data Relearn. Think about six months ago, the author also learn to understand the machine, and now can be regarded as a machine learning Expert (laughs). But the author has always believed that the real application of machine learning is not through the concept or the way of thinking, but through practice. Only when the machine learning technology is really applied, it can be regarded as a machine learning understanding into a level. Is the so-called "spring snow" technology, must also fall under the "under the bar people" under the use of the scene. At present there is a kind of culture, domestic and foreign research machine learning some scholars, there is a noble and forced to think that their research is ordinary people can not understand, but this idea is fundamentally wrong, not in the real place to play a role , What proof of your research worth it? The author believes that the need for high technology used to change the lives of ordinary people in order to play its fundamental value. Some simple scenes, it is the best place to practice machine learning technology.

About Admin MC3

This is dummy text. It is not meant to be read. Accordingly, it is difficult to figure out when to end it. But then, this is dummy text. It is not meant to be read. Period.