Big data’s thinking is a way of thinking that modern people should have.
Uncertainty is everywhere and can not be predicted by formula. the only way to eliminate uncertainty in the future is to introduce information, which is not only the soul of information theory, but also the theoretical basis of big data’s thinking advocated today.
We can explain the decisive role of this methodology in the development of science and technology industry through how human beings change from mechanical mode of thinking to big data’s way of thinking on the road of exploring machine intelligence.
Since the birth of electronic computers in 1946, human beings have been thinking about the following questions: can machines have the same intelligence as human beings?
In 1956, some American scientists who were young at that time and later famous scientists (including Shannon, Minsky, Simon, etc.) put forward the concept and idea of artificial intelligence at Dartmouth College.
But what exactly should be done?
These scientists did not have a clear train of thought at that time.
In the next 15 years, people are actually developing artificial intelligence according to the inertia of mechanical thinking, that is, trying to use some definite rules to describe people’s way of thinking clearly.
Therefore, the research of artificial intelligence at that time was based on rules.
After more than a decade of efforts, this research method has come to a dead end.
In the history of the development of machine intelligence, Fredericklelinek, an information theory expert, is an epoch-making figure. His contribution lies in that from the perspective of information theory, many artificial intelligence problems have been turned into communication problems, and through data-driven methods, many problems such as speech recognition and machine translation have been solved, which has laid the foundation for us to use computers to understand natural language.
Before Jalinick, using the traditional rule-based artificial intelligence method, the computer could only recognize one or two hundred words, and the error rate was as high as 30%. This technology is completely impractical.
During his academic leave at IBM in 1972, Jalinick used a data-driven approach to reduce the error rate of speech recognition from about 30 per cent to less than 10 per cent, and was able to recognize more than 20, 000 English words.
In this way, speech recognition can move from laboratory to practical application.
In the 1980s, when personal computers were just starting out, even supercomputers had limited processing power, and not every scholar could clearly understand this way of thinking that relies on large amounts of data rather than artificial rules to solve problems.
But after the 1990s, the advantages of data became apparent.
In the 10 years since the mid-1990s, the error rate of speech recognition has halved, while the accuracy of machine translation has doubled, of which about 20% of the contribution comes from the improvement of methods, while 80% comes from the increase in the amount of data.
From 2000 to 2015, more and more so-called intelligence problems, such as the recognition and understanding of medical images, driverless cars, computer reading legal literature, computer automatic answer questions, and so on, have been solved one after another. it mainly depends on the use of a large number of multi-dimensional or even very complete data.
Big data’s concept was put forward on this basis.
Big data’s completeness can achieve complex tasks that only people could do in the past, such as Google’s driverless car is a good example.
The driverless car is essentially a robot, which has not been done by academia all over the world for decades.
In 2004, economists also thought that drivers were one of several industries where computers were difficult to replace people.
Of course, they didn’t come to this conclusion out of thin air. In addition to analyzing the technical and psychological difficulties, they also looked at the results of the self-driving car rally organized by DARPA, when the top self-driving car took hours to drive eight miles and then broke down.
But it took Google just six years to complete the seemingly impossible task of self-driving cars.
Why can Google do this in a short time?
The most fundamental reason is to adopt a different way of thinking from previous scientists-to turn the robot problem into a big data problem.
First of all, the self-driving car project is an extension of the Google Street View project. The Google self-driving car can only go where it “sweeps the street”. When driving to these places, it knows the surrounding environment very well, which is the power of big data’s completeness.
In the past, self-driving cars developed in those research institutes had to identify targets temporarily wherever they went, which is the way how people think.
Secondly, Google’s self-driving car is equipped with more than a dozen sensors and scans dozens of times per second, which not only exceeds what people call “keep sharp eyes and keen ears/ be observant and alert”, but also accumulates a lot of data and has an accurate understanding of local road conditions and vehicle driving patterns under different traffic conditions. Computers learn these “experiences” much faster than people, which is big data’s multi-dimensional advantage.
These two points are not available in the academic circles in the past, and relying on them, Google can realize the automatic driving of the car in a very short time.
The importance of big data lies in that it is not only a technical means, but also a way of thinking, which can be seen from the word Big Data itself.
Big and Large both mean big in English, but why doesn’t big data call her Large Data?
There is a slight difference between the words Big and Large. Large is relatively specific, meaning large, for example, a large table (alargetable), and Big is a relatively abstract concept, which corresponds to small and large.
Big Data is more of a way of thinking.
It breaks the practice that when we do things in the past, we must know the cause first and then the result according to causality. Instead, we find the result directly through big data’s analysis and find the cause in turn.
Today, many Internet companies’ products are developed based on this idea, such as Google search results ranking, advertising recommendations, Amazon and Taobao product recommendations, and so on, which are based on the analysis of a large amount of user data (especially click data). As for the reasons, these companies do not care.
In the past 50 years, Moore’s Law (by 2015, exactly 50 years after Moore’s Law was proposed) has been the strongest driving force for the development of science, technology and economy in the world, because the entire information age has rebuilt every industry with semiconductor chips. and the process of spawning a lot of new industries.
In the next two decades, it will shift from the Moore era to the big data era, that is to say, whoever has the data is the king.
Behind this, the fundamental driving force is the use of information to eliminate all kinds of uncertainty.
The importance of big data lies in that it is not only a technical means, but also a methodology, that is to say, we must abandon the mechanical way of doing things that rely on rules and emphasize causality in the past, and become a way of using information to solve problems.