From What I Read: Deep Learning

(If you came because of the Bee Gees’ excerpt, congratulations – you’ve just been click-baited.)

Recently, I came across a video on my Facebook news feed, which showed several Hokkien phrases used by Singaporeans – one of which was “cheem”, literally “deep” in English. It is usually used to describe someone being very profound or complex, usually in content and philosophy.

I perceive that despite the geographical differences, there is somewhat a common understanding between the East and the West on the word “deep”. The English term “shallow” means simplistic apart from the lack of physical depth, and so is the phrase “skin deep”.

Of course, the term “deep learning” (DL) does not simply derive from the word “deep” being complicated, but certainly the method of DL is nothing short of being complex.

For this post, I would do the write-up in a slightly different manner – an article of reading will be the “anchor” article in answering each section, and then readings of other articles will be added on to the foundation laid by the “anchor”. For those who pay attention, you would notice the pattern.

My primarily readings will be from the following: Bernard Marr (through Forbes), Jason Brownlee, MATLAB & Simulink, Brittany-Marie Swanson, Robert D. Hof (through MIT Technology Review), Radu Raicea (through, and Monical Anderson (through Artificial Understanding and a book by Lauren Huret). As usual, the detailed references are included below.

What is the subject about?

(Now before I go into the readings, I wanted to bring back to how the term “deep learning” was first derived. It was first appeared in academic literature in 1986, when Rina Dechter wrote about “Learning While Searching in Constraint-Satisfaction-Problems” – the paper introduced the term to Machine Learning, but did not shed light on what DL is more commonly known today – neural networks. It was not until the year 2000 that the term was introduced to neural network by Aizenberg & Vandewalle.)

Tracing back to my previous posts, DL is a subset of Machine Learning, which itself is a subset of Artificial Intelligence. Marr pointed out that while Machine Learning took several core ideas of AI and “focuses them on solving real-world problems…designed to mimic our own decision-making”, DL puts further focus on certain Machine Learning tools and techniques in applying to solve “just about any problem which requires “thought” – human or artificial”.

Brownlee offered a different dimension to the definition of DL: “a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks”. This definition offered was supported by several more researchers cited in the article, some of them are:

  • Andrew Ng (“The idea of deep learning as using brain simulations, hope to: make learning algorithms much better and easier to use; make revolutionary advances in machine learning and AI”)
  • Jeff Dean (“When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically…I think of them as deep neural networks generally”)
  • Peter Norvig (“A kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output”)

The article as a whole was rather academic in nature, but also offered a simplified summary: “deep learning is just very big neural networks on a lot more data, requiring bigger computers”.

The description of DL as a larger-scale, multi-layer neural network was supported by Swanson’s article. The idea of a neural network mimicking a human brain was reiterated in Hof’s article.

How does it work?

Marr described how DL works as having a large amount of data fed through “logical constructions asking a series of binary true/false questions, or extract a numerical value, of every bit of data which pass through them, before classifying them according to the answers received” known as neural networks, in order to make decisions about other data.

Marr’s article gave an example of a system designed to record and report the number of vehicles of a particular make and model passing along a public road. The system would first fed with a large database of car types and their details, of which the system would process (hence “learning”) and compare with data from its sensors – by doing so, the system could classify the type of vehicles that passed by with some probability of accuracy. Marr further explained that the system would increase that probability by “training” itself with new data – and thus new differentiators – it receives. This, according to Marr, is what makes the learning “deep”.

Brownlee’s article, through its aggregation of prior academic researches and presentations, pointed out that the “deep” refers to the multiple layers within the neural network models – of which the systems used to learn representations of data “at a higher, slightly more abstract level”. The article also highlighted the key aspect of DL: “these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure”.

Raicea illustrated the idea of neural networks as neurons having grouped into three different types of layers: input layer, hidden layer(s) and output layer – the “deep” would refer to having more than one hidden layer. The computation is facilitated by connections between neurons that are associated with a (randomly set) weight which dictates the importance of the input value. The system would iterate through the data set and compare the outputs to see how much it is far off from the real outputs, before readjusting the weights between neurons.

How does it impact (in a good way)?

Marr cited several applications of DL that are currently deployed or under work-in-progress. DL’s use-case in object recognition would enhance the development of self-driving cars, while DL techniques would aid in the development of medicine “genetically tailored to an individual’s genome”. Something closer to the layman and Average Joe, DL systems are empowered to analyse data and produce reports in natural-sounding human language with corresponding infographics – this could be seen in some news reports generated by what we know as “robots”.

Brownlee’s article did not expound much on the use-cases. Nevertheless, it highlighted that “DL excels on problem domains where the input (and even output) are analog”. In other words, DL does not need data to come in numerical and in tables, and neither should the data it produces – offering a qualitative dimension to analysis as compared to conventional data analysis.

Much of the explicit benefits were discussed in the prior posts on Machine Learning and Artificial Intelligence.

What are the issues?

Brownlee recapped the prior issues of DL in the 1990s through Geoff Hinton’s slide: back then, datasets were too small, computing power was too weak, and generally the methods of operating it were improper. MATLAB & Simulink pointed out that DL became useful because the first two factors of failures have seen great improvements over time.

Swanson briefly warned on the issue of using multiple layers in the neural network: “more layers means your model will require more parameters and computational resources and is more likely to become overfit”.

Hof cited points raised by DL critics, chiefly on how the development of DL and AI in general have deviated away from putting into consideration how an actual brain functions “in favour of brute-force computing”. An example was captured by Jeff Hawkins on how DL failed to take into consideration the concept of time, in which human learning (which AIs supposed to emulate) would depend on the ability to recall sequences of patterns, and not merely still images.

Hof also mentioned that current DL applications are within speech and image recognition, and to extend the applications beyond them would “require more conceptual and software breakthroughs” as well as advancements in processing power.

Much of other DL’s issues were rather similar to those faced by Machine Learning and Artificial Intelligence, in which I have captured accordingly in the previous posts. One of the recurring themes would be how inexplicable DL systems get to its output, or in the words of Anderson’s article, “the process itself isn’t scientific”.

How do we respond?

Usually, I would comment in this section with very forward-looking, society-challenging calls for action – and indeed I have done for the post on AI and Machine Learning.

But I would like to end with a couple of paragraphs from Anderson in a separate publication, which captured the anxiety about AI in general, and some hope for DL:

“A computer programmed in the traditional way has no clue about what matters. So therefore we have had programmers who know what matters creating models and entering these models into the computer. All programming is like that; a programmer is basically somebody who does reduction all day. They look at the rich world and they make models that they enter into the computer as programs. The programmers are intelligent, but the program is not. And this was true for all old style reductionist AI.

… All intelligences are fallible. That is an absolute natural law. There is no such thing as an infallible intelligence ever. If you want to make an artificial intelligence, the stupid way is to keep doing the same thing. That is a losing proposition for multiple reasons. The most obvious one is that the world is very large, with a lot of things in it, which may matter or not, depending on the situations. Comprehensive models of the world are impossible, even more so if you considered the so-called “frame problem”: If you program an AI based on models, the model is obsolete the moment you make it, since the programmer can never keep up with the constant changes of the world evolving.

Using such a model to make decisions is inevitably going to output mistakes. The reduction process is basically a scientific approach, building a model and testing it. This is a scientific form of making what some people call intelligence. The problem is not that we are trying to make something scientific, we are trying to make the scientist. We are trying to create a machine that can do the reduction the programmer is doing because nothing else counts as intelligent.

… Out of hundreds of things that we have tried to make AI work, neural networks are the only one that is actually going to succeed in producing anything interesting. It’s not surprising because these networks are a little bit more like the brain. We are not necessarily modeling them after the brain but trying to solve similar problems ends up in a similar design.”

Interesting Video Resources

But what *is* a Neural Network? | Chapter 1, deep learning – 3Blue1Brown:

How Machines *Really* Learn. [Footnote] – CGP Grey:


What Is The Difference Between Deep Learning, Machine Learning and AI? – Forbes:

What is Deep Learning? – Jason Brownlee:

What Is Deep Learning? | How It Works, Techniques & Applications – MATLAB & Simulink:

What is Deep Learning? – Brittany-Marie Swanson:

Deep Learning – MIT Technology Review:

Want to know how Deep Learning works? Here’s a quick guide for everyone. – Radu Raicea:

Why Deep Learning Works – Artificial Understanding – Artificial Understanding:

Artificial Fear Intelligence of Death. In conversation with Monica Anderson, Erik Davis, R.U. Sirius and Dag Spicer – Lauren Huret:

From What I Read: Artificial Intelligence

“Doesn’t look like anything to me.”

If you get the reference of that quote, let’s do a virtual high-five, for I have found another fellow fan of the HBO series, Westworld.

When talking about Artificial Intelligence, or AI, it is often too easy for the layman to think of human-like robots, like those from Westworld. While it is true that robots with high cognitive function do operate on AI, robots are far from being representative of what AI is.

So, what is AI? And like how my other posts go, we will explore how AI works, AI’s impact and issues, and how we should respond.

My readings for this article is mainly from SAS, McKinsey, Nick Heath on ZDNet, Bernard Marr, Erik Brynjolfsson and Andrew McAfee on Harvard Business Review, and Tom Taulli on Forbes.

What is the subject about?

The definition of AI seems to be rather fluid, as some of the articles pointed out. But one thing is for sure: the phrase was first coined by Minsky and McCarthy in their Darthmouth College summer conference paper in 1956. Heath summarised the idea by Minsky and McCarthy on AI as “any task performed by a program or a machine that, if a human carried out the same activity … the human had to apply intelligence to accomplish the task”. (Click here if you are interested in the proposal paper.)

Such broadness of the initial definition for AI unfortunately meant the debate on what constitutes as AI would be far and wide.

Subsequent definitions did not do much in refining the original definition further. McKinsey referred AI to the ability of machines in exhibiting human-like intelligence, while Marr perceived AI as “simulating the capacity for abstract, creative, deductive thought – and particularly the ability to learn – using the digital, binary logic of computers”.

In short, machines emulating human in intelligence.

However, a comment under Heath’s article shed some interesting light on the understanding of AI.


How does it work?

Reiterating from the comment, “AI is a complex set of case statements run on a massive database. These case statements can update based on user feedback on how valid the results are”.

Such definition would not be too far off from a technical definition of AI. Andrew Roell, a managing partner at Analytics Ventures, was quoted in Taulli’s article in describing AI as computers being fed with algorithms to process data leading to certain desired outcomes.

Obviously, two components are required for AI to work: algorithm and data. However, what makes AI different from an ordinary piece of software is the component of learning. SAS described AI’s working as “combining large amounts of data with fast, iterative processing and intelligent algorithms, allowing the software to learn automatically from patterns or features in the data”.

Of course, the intricacies of AI is wide considering the various subfields within, such as Machine Learning, Deep Learning, Cognitive Computing and Natural Language Processing. Some of these topics would be discussed in future posts. But for now, it is suffice to say that these methods analyse a variety of data to achieve a certain goal.

There is also another way to categorise the research and development work in AI. Heath and Marr pointed out that there are two main branches of AI. Narrow AI, or as Marr put it, “applied/specialised AI” would be more common to ordinary people like us through its widespread application (think Apple’s Siri and the Internet of Things), since the AI simulates human thought to carry out specific tasks (having been learned or taught without being explicitly programmed).

The other branch of AI is “general AI”, or artificial general intelligence (AGI). General AI seeks to carry out a full simulation of the adaptable intellect found in humans – that is, being capable of learning how to carry out various vastly different tasks and to reason on wide-ranging topics based on accumulated experience, as Heath and Marr pointed out. Such intelligence requires a high amount of processing power to match that of human’s cognitive performance, and AGI being a reality is rather a story in the distant future. Others however would argue that given the evolution of processing technology, supported by further development in the integration between multiple different Narrow AI, AGI may not be too far away, as indicated by IBM’s Watson.

How does it impact (in a good way)?

Even though AI sounded like a buzzword in recent times, the applications can be traced to quite a while back. As an example, the Roomba vacuum (that circular robot vacuum cleaner that whizzed across the room) is an application of AI that leveraged on sensors and sufficient intelligence to carry out the specific task of cleaning a home – this was first conceived in 2002, 16 years ago (from point of writing). 5 years earlier, IBM’s Deep Blue machine defeated world chess champion Garry Kasparov.

As mentioned earlier, the application of narrow AI is widespread, since the scope here is to carry out specific tasks. Heath pointed out several use-cases such as interpreting video feeds from drones carrying out visual inspections of infrastructure, organising calendars, chatbots to respond to simple queries from customers and assisting radiologists to spot potential tumors in X-ray. Brynjolfsson and McAfee on the other hand highlighted the advances in voice recognition (Siri, Alexa, Google Assistant) and image recognition (think about Facebook recognising your friend’s faces from your photos in suggesting to tag).

If you notice, I have left out some cognition part of the application, which I shall reserve for the Machine Learning article (and other articles) in the future.

In the world of business, AI may help businesses to deliver enhanced customer experience by customising offerings based on data of customer preference and behaviour, as indicated by McKinsey. AI may also help to provide smarter research and development through better error detection, and provide forecasting of supply and demand to optimise production, in manufacturing.

What are the issues?

Going back to the core components of AI, you will see that one of the main dependency of AI is data. It goes without mention then, that quality data produces quality AI, and inaccuracies in the data will be reflected accordingly in the results.

The other issue that current AI systems face is that many of them falls under the narrow AI category, which could only carry out  specialised and clearly defined tasks. SAS pointed out the example of an AI system that detects healthcare fraud cannot also detect tax fraud or warranty claims fraud. The AI system is dependent on the defined task and scope that it was trained.

Brynjolfsson and McAfee’s article identified three risks brought about by the difficulty in humans understanding how AI systems reached to certain decisions, given that advanced AI systems like deep neural networks have a complex decision making process, and that they could not articulate the rationale behind those decisions even when they gather a wealth of knowledge and data. The three risks are: hidden biases derived from the training data provided, reliance on statistical truths over literal truths which may lack verifiability, and difficulty in diagnosis and correction during an error.

In decision making, AI systems may fall short in contextualisation, that is to understand and take into account the nuances of human culture. Such data would be rather difficult to derive, let alone to provide for training. That being said, Google Duplex is an indicator of making headways in overcoming such a challenge.

Further out into the future, AI systems may lead to high technological unemployment, as jobs may be made redundant as Heath implies. Such possibility is deemed as a more credible possibility than an existential threat posed by AIs, a concern shared by not merely science-fiction movies, but famed and intelligent people like Stephen Hawking and Elon Musk.

In between the two possibilities lie the various issues in moral and ethics, such as machine rights, machine consciousness, singularity and strong AI, and so on. But even closer to current times, we are currently dealing with ethics issues in our design of autonomous vehicles (which employ AI systems), commonly known as the “Trolley Problem”.

How do we respond?

There was a period in time which was categorised as the “AI winter”. It was the 1970s, and having seen little results from huge investments, public and private institutions pulled the plug in funding research for AI, specifically the AGI kind. It was in the 1980s that AI research was revived, thanks to business leaders like Ken Olsen who realised the commercial benefits of AI, and developed expert systems that are focused on narrow tasks.

Fast forward to today, AI is pervasive. Unknowingly, we may have been users of AI technology already. Part of the future imagined in the past is here. And for the most part, life has changed for the better.

Still, there is much room for AI application in businesses to generate value (although much of the talk focused on the subfield of machine learning). Companies should realise that, like desktop computer technology, the resolution of current flaws and issues in AI technology and the subsequent evolution of the technology can be accelerated with the support of adoption.

However, we as a society may need to strive in the grave and philosophical issues posed by AI, answering tough questions on the future of jobs and even the lives of people as AI gradually strengthens. And in the midst, ethical concerns continue to overhang, awaiting for us to address. Perhaps Partnership on AI, a foundation founded by tech giants like Google, IBM, Microsoft and Facebook, is a good place to start.


What is AI? Everything you need to know about Artificial Intelligence – ZDNet:

What is Artificial Intelligence And How Will It Change Our World? – Bernard Marr:

Artificial Intelligence – What it is and why it matters – SAS:

The Business of Artificial Intelligence – Harvard Business Review:

What Entrepreneurs Need To Know About AI (Artificial Intelligence) – Forbes:

Artificial Intelligence: The Next Digital Frontier? – McKinsey Global Institute:

iWonder – AI: 15 key moments in the story of artificial intelligence – BBC: