From What I Read: Deep Learning

(If you came because of the Bee Gees’ excerpt, congratulations – you’ve just been click-baited.)

Recently, I came across a video on my Facebook news feed, which showed several Hokkien phrases used by Singaporeans – one of which was “cheem”, literally “deep” in English. It is usually used to describe someone being very profound or complex, usually in content and philosophy.

I perceive that despite the geographical differences, there is somewhat a common understanding between the East and the West on the word “deep”. The English term “shallow” means simplistic apart from the lack of physical depth, and so is the phrase “skin deep”.

Of course, the term “deep learning” (DL) does not simply derive from the word “deep” being complicated, but certainly the method of DL is nothing short of being complex.

For this post, I would do the write-up in a slightly different manner – an article of reading will be the “anchor” article in answering each section, and then readings of other articles will be added on to the foundation laid by the “anchor”. For those who pay attention, you would notice the pattern.

My primarily readings will be from the following: Bernard Marr (through Forbes), Jason Brownlee, MATLAB & Simulink, Brittany-Marie Swanson, Robert D. Hof (through MIT Technology Review), Radu Raicea (through, and Monical Anderson (through Artificial Understanding and a book by Lauren Huret). As usual, the detailed references are included below.

What is the subject about?

(Now before I go into the readings, I wanted to bring back to how the term “deep learning” was first derived. It was first appeared in academic literature in 1986, when Rina Dechter wrote about “Learning While Searching in Constraint-Satisfaction-Problems” – the paper introduced the term to Machine Learning, but did not shed light on what DL is more commonly known today – neural networks. It was not until the year 2000 that the term was introduced to neural network by Aizenberg & Vandewalle.)

Tracing back to my previous posts, DL is a subset of Machine Learning, which itself is a subset of Artificial Intelligence. Marr pointed out that while Machine Learning took several core ideas of AI and “focuses them on solving real-world problems…designed to mimic our own decision-making”, DL puts further focus on certain Machine Learning tools and techniques in applying to solve “just about any problem which requires “thought” – human or artificial”.

Brownlee offered a different dimension to the definition of DL: “a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks”. This definition offered was supported by several more researchers cited in the article, some of them are:

  • Andrew Ng (“The idea of deep learning as using brain simulations, hope to: make learning algorithms much better and easier to use; make revolutionary advances in machine learning and AI”)
  • Jeff Dean (“When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically…I think of them as deep neural networks generally”)
  • Peter Norvig (“A kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output”)

The article as a whole was rather academic in nature, but also offered a simplified summary: “deep learning is just very big neural networks on a lot more data, requiring bigger computers”.

The description of DL as a larger-scale, multi-layer neural network was supported by Swanson’s article. The idea of a neural network mimicking a human brain was reiterated in Hof’s article.

How does it work?

Marr described how DL works as having a large amount of data fed through “logical constructions asking a series of binary true/false questions, or extract a numerical value, of every bit of data which pass through them, before classifying them according to the answers received” known as neural networks, in order to make decisions about other data.

Marr’s article gave an example of a system designed to record and report the number of vehicles of a particular make and model passing along a public road. The system would first fed with a large database of car types and their details, of which the system would process (hence “learning”) and compare with data from its sensors – by doing so, the system could classify the type of vehicles that passed by with some probability of accuracy. Marr further explained that the system would increase that probability by “training” itself with new data – and thus new differentiators – it receives. This, according to Marr, is what makes the learning “deep”.

Brownlee’s article, through its aggregation of prior academic researches and presentations, pointed out that the “deep” refers to the multiple layers within the neural network models – of which the systems used to learn representations of data “at a higher, slightly more abstract level”. The article also highlighted the key aspect of DL: “these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure”.

Raicea illustrated the idea of neural networks as neurons having grouped into three different types of layers: input layer, hidden layer(s) and output layer – the “deep” would refer to having more than one hidden layer. The computation is facilitated by connections between neurons that are associated with a (randomly set) weight which dictates the importance of the input value. The system would iterate through the data set and compare the outputs to see how much it is far off from the real outputs, before readjusting the weights between neurons.

How does it impact (in a good way)?

Marr cited several applications of DL that are currently deployed or under work-in-progress. DL’s use-case in object recognition would enhance the development of self-driving cars, while DL techniques would aid in the development of medicine “genetically tailored to an individual’s genome”. Something closer to the layman and Average Joe, DL systems are empowered to analyse data and produce reports in natural-sounding human language with corresponding infographics – this could be seen in some news reports generated by what we know as “robots”.

Brownlee’s article did not expound much on the use-cases. Nevertheless, it highlighted that “DL excels on problem domains where the input (and even output) are analog”. In other words, DL does not need data to come in numerical and in tables, and neither should the data it produces – offering a qualitative dimension to analysis as compared to conventional data analysis.

Much of the explicit benefits were discussed in the prior posts on Machine Learning and Artificial Intelligence.

What are the issues?

Brownlee recapped the prior issues of DL in the 1990s through Geoff Hinton’s slide: back then, datasets were too small, computing power was too weak, and generally the methods of operating it were improper. MATLAB & Simulink pointed out that DL became useful because the first two factors of failures have seen great improvements over time.

Swanson briefly warned on the issue of using multiple layers in the neural network: “more layers means your model will require more parameters and computational resources and is more likely to become overfit”.

Hof cited points raised by DL critics, chiefly on how the development of DL and AI in general have deviated away from putting into consideration how an actual brain functions “in favour of brute-force computing”. An example was captured by Jeff Hawkins on how DL failed to take into consideration the concept of time, in which human learning (which AIs supposed to emulate) would depend on the ability to recall sequences of patterns, and not merely still images.

Hof also mentioned that current DL applications are within speech and image recognition, and to extend the applications beyond them would “require more conceptual and software breakthroughs” as well as advancements in processing power.

Much of other DL’s issues were rather similar to those faced by Machine Learning and Artificial Intelligence, in which I have captured accordingly in the previous posts. One of the recurring themes would be how inexplicable DL systems get to its output, or in the words of Anderson’s article, “the process itself isn’t scientific”.

How do we respond?

Usually, I would comment in this section with very forward-looking, society-challenging calls for action – and indeed I have done for the post on AI and Machine Learning.

But I would like to end with a couple of paragraphs from Anderson in a separate publication, which captured the anxiety about AI in general, and some hope for DL:

“A computer programmed in the traditional way has no clue about what matters. So therefore we have had programmers who know what matters creating models and entering these models into the computer. All programming is like that; a programmer is basically somebody who does reduction all day. They look at the rich world and they make models that they enter into the computer as programs. The programmers are intelligent, but the program is not. And this was true for all old style reductionist AI.

… All intelligences are fallible. That is an absolute natural law. There is no such thing as an infallible intelligence ever. If you want to make an artificial intelligence, the stupid way is to keep doing the same thing. That is a losing proposition for multiple reasons. The most obvious one is that the world is very large, with a lot of things in it, which may matter or not, depending on the situations. Comprehensive models of the world are impossible, even more so if you considered the so-called “frame problem”: If you program an AI based on models, the model is obsolete the moment you make it, since the programmer can never keep up with the constant changes of the world evolving.

Using such a model to make decisions is inevitably going to output mistakes. The reduction process is basically a scientific approach, building a model and testing it. This is a scientific form of making what some people call intelligence. The problem is not that we are trying to make something scientific, we are trying to make the scientist. We are trying to create a machine that can do the reduction the programmer is doing because nothing else counts as intelligent.

… Out of hundreds of things that we have tried to make AI work, neural networks are the only one that is actually going to succeed in producing anything interesting. It’s not surprising because these networks are a little bit more like the brain. We are not necessarily modeling them after the brain but trying to solve similar problems ends up in a similar design.”

Interesting Video Resources

But what *is* a Neural Network? | Chapter 1, deep learning – 3Blue1Brown:

How Machines *Really* Learn. [Footnote] – CGP Grey:


What Is The Difference Between Deep Learning, Machine Learning and AI? – Forbes:

What is Deep Learning? – Jason Brownlee:

What Is Deep Learning? | How It Works, Techniques & Applications – MATLAB & Simulink:

What is Deep Learning? – Brittany-Marie Swanson:

Deep Learning – MIT Technology Review:

Want to know how Deep Learning works? Here’s a quick guide for everyone. – Radu Raicea:

Why Deep Learning Works – Artificial Understanding – Artificial Understanding:

Artificial Fear Intelligence of Death. In conversation with Monica Anderson, Erik Davis, R.U. Sirius and Dag Spicer – Lauren Huret:

From What I Read: Machine Learning

Let’s be up front here: my introduction section for the Artificial Intelligence post stole quite a lot of limelight from the remaining posts within the AI series (since this post is the subset of AI, and the next post is possibly the subset of this post), so I will not bother to think too much about coming up with an introduction with a “bang”.

The other disclaimer here being that this post was not how I envisioned a month ago. This is mainly because as I search out on the topic further, there are more and deeper ways to explain the topic (and did I say varied too?). And this extends beyond reading materials – there are several kinds of videos out there which aims to explain the subject (in varied edutainment levels).

But in the interest of time, and effort, I will try to be rather layman-ish and bare-bone-ish in the approach in handling the subject of Machine Learning. But I will include links to resources I find interesting (and may not end up being used for this post) at the end of this post.

My readings for this post in from Bernard Marr on Forbes, MATLAB & Simulink, Experts System, Yufeng Guo on Towards Data Science, Danny Sullivan on MarTech Today, SAS and several Quora replies.

What is the subject about?

So what is machine learning (ML)?

It is rather widely acknowledged that ML is a subset of Artificial Intelligence (AI), and so, from a concept level, it would bear similarities to the goals of AI: to mimic humans’ intelligence as machines. On a subset level, as Marr mentioned in his article, ML seeks to “teach computers to learn in the same way we do” through interpreting information, classifying them, and learn from successes and failures.

Such a description is concurred by an article from MATLAB & Simulink (M&S), which stated that ML is a “data analytics technique that teaches computers to…learn from experience”, even adding that this learning method “comes naturally to humans and animals”.

So what does “learning from experience” and “learning from successes and failures” underline? They imply the absence of explicit programming from a programmer, as Experts System’s (ES) article explained, and further added the idea of automation in the learning process.

Guo took a different approach by defining ML as “using data to answer questions”, outlining the idea of training from an input (“data”) and the outcome of making predictions or inferences (“answer questions”). Guo further mentioned that the two parts in the definition is connected by analytical models, in which SAS’ article also highlighted.

To conclude this section, we can connect the two approaches of defining ML, sloppily amalgamate as “a data analytics technique that teaches computers to learn automatically through experiences by using data, ultimately to answer questions through inferences and predictions”.

How does it work?

In explaining how ML works, many of the articles in review would mention the two types of techniques under ML, namely supervised learning and unsupervised learning.

As M&S’ article puts it, supervised learning develops predictive models based on both input and output data to predict future outputs. Such examples of application include handwriting recognition (which leverages on classification techniques like discriminant analysis and logistic regression) and electricity load forecasting (which uses regression techniques like linear and nonlinear model and stepwise regression).

Unsupervised learning seeks to find hidden patterns or intrinsic structures in input data through grouping and interpreting the data – there is no output data involved. This type of technique is usually used for exploratory data analysis, and would see applications in object recognition, gene sequence analysis and market research. The M&S’ article cited the clustering technique as the most common unsupervised learning technique, which uses algorithms the likes of hierarchical clustering and hidden Markov models. In short, unsupervised learning is good for splitting data into clusters.

ES added several dimensions to the types of techniques and ML algorithms to shed more light on how ML works, namely on semi-supervised ML algorithms (falling between supervised and unsupervised learning which uses labeled (input data with accompanying output data) and unlabeled data for training to improve learning accuracy) and reinforcement ML algorithms (interacting with the environment by producing actions and discovering errors or rewards to determine the ideal and optimised behavior within a specific context).


Sullivan’s article mentioned about the three major parts that makes up ML systems, namely the model (the system that makes predictions/identifications), the parameters (the factors used by the model to produce decisions) and the learner (the system that adjusts the parameters and subsequently the model by looking at differences in predictions versus actual outcome.

Such a way to explain the workings of ML systems bears similarity to how CGP Grey explains in his video which I find rather interesting.

Guo outlined 7 steps of ML in his separate article:

  1. Data gathering
  2. Data preparation
  3. Model selection
  4. Model training
  5. Model evaluation
  6. (Hyper)Parameter Tuning
  7. Model prediction

Most of the steps are pretty much similar to how Sullivan’s article described and implied, including the step of training, in which Sullivan described as the “learning part of machine learning” and “rinse and repeat” – these process would reshape the model to refine the predictions.

Again, this is not a technical post, so I would spare you with too much details. I would, however, include links to a few videos for you to watch should you be interested.

And finally, on Quora, there is also a response to break down how ML works on a very relatable manner – that machines are trying to do how we are doing tasks, but with infinite memory and speed of handling millions of transactions every second.

How does it impact (in a good way)?

Many of us would have been experiencing the applications of ML unknowingly everyday. Take YouTube’s Video Recommendations system, which relies on algorithms and the input data – your search and watch history. The model is further refined with other inputs such as the “Not interested” button you clicked on some of their recommendations, and percentage of video watched (perhaps).

And speaking of recommendations, how can we not include the all-too-famous Google search engine and its results recommendations? And speaking of Google, how can we not bring to mind their Google Translate feature which allows users to translate languages through visual input?

So certainly, the use case for ML is quite prevalent in these areas that the public-at-large is familiar of.

M&S outlined several other areas where ML has become a key technique to solve problems, such as credit scoring in assessing credit-worthiness of borrowers, motion and object detection for automated vehicles, tumour detection and drug discovery in the field of biology, and predictive maintenance for manufacturing.

SAS’ article highlighted that ML could enable for faster and more complex data analysis with better accuracy, while also being able to process large amount of data coming from data mining and affordable data storage.

And when ML is able to do certain tasks which would have required humans to do in the past, that would mean cost savings for the enterprises involved. This though provided a nice segue way to the next section.

What are the issues?

Now, call me lazy if you want, but as I have mentioned earlier: since ML is a subset of AI, there are several issues AI faced that would be faced by ML, such as the problem of input data quality (both accuracy and biaslessness), and the difficulty in explaining how the model may reached to its conclusion (especially if it involved deploying neural network technique).

To also reiterate from the previous post, we may foresee jobs being displaced as tasks can be increasingly automated and taken over more efficiently by ML systems. That being said, the half-glass-full view of the situation is that the job functions have been augmented and changed – if we could get the workforce to adapt to these job functions, the impact could be minimised.

As ML becomes widely adopted, there would be a greater demand of skilled resources. This sounded like a solution to the half-glass-full view mentioned, but seeing that the field of ML technology is still relatively new, it would probably mean higher cost and difficulty in acquiring expertise in ML, let alone to train the existing workforce.

And as ML become increasingly and widely used, the hunger for data would become more insatiable. We as a society may increasingly find ourselves to address the question on how much personal data should we be sharing as Doromal writes in his Quora reply to a question.

But to get to wide adoption, there is a need for the democratisation of ML, since presently investments in ML can be hefty, and hence the exclusivity of ML whereby more advanced systems would be available to users that could afford.

How do we respond?

My answer to this question would not run far from what was posed in the post on AI. But to add on to that, as I have mentioned in the earlier section, we as a society would need to take a hard look at how do we perceive data privacy, since ML is dependent on the availability of data to form better predictions and inferences.

There is growing interest among companies on ML upon seeing the benefits it can reap. Perhaps through the high demand that there will be a greater push in the development and subsequent democratisation of the technology. That said, companies need to find the balance between deployment of ML and managing their workforce which may be increasingly redundant.

The teaching and learning of ML should become more widespread to meet the increased need of such skilled workforce, while a better level of awareness about ML among individuals of the society would also be needed in the future to come in order to understand how certain automated decisions they will face are derived from.

ML is unlike the other topics mentioned in this blog, in that the technology is already here and now today, up and running (while things like ICO and even the commercial use of blockchain is still yet to be seen). And as implied and mentioned, the applications have already become rather prevalent today. Individuals in the society however are still probably some way off from having a good understanding about ML, but that would probably be changed soon as widespread automation increasingly creeps and looms on the horizon.

Interesting video resources

How Machines Learn – CGP Grey:

Machine Learning & Artificial Intelligence: Crash Course Computer Science #34 – CrashCourse:

What is Machine Learning? – Google Cloud Platform:

But what *is* a Neural Network? | Chapter 1, deep learning – 3Blue1Brown:


What Is Machine Learning – A Complete Beginner’s Guide In 2017 – Forbes:

What Is Machine Learning? | How It Works, Techniques & Applications – MATLAB & Simulink:

What is Machine Learning? A definition – Expert System:

How Machine Learning Works, As Explained By Google – MarTech Today:

How do you explain Machine Learning and Data Mining to non Computer Science people? – Quora:

Machine Learning: What it is and why it matters | SAS:

What is Machine Learning? – Towards Data Science:

The 7 Steps of Machine Learning – Towards Data Science:

5 Common Machine Learning Problems & How to Beat Them – Provintl:

What are the main problems faced by machine learning engineers at Google? – Quora:

An Honest Guide to Machine Learning: Part One – Axiom Zen Team – Medium:

These are three of the biggest problems facing today’s AI – The Verge: