Deep learning book

Zero · « **on:** April 17, 2022, 03:12:27 pm »

A book about deep learning.

https://github.com/janishar/mit-deep-learning-book-pdf

Zero · « **Reply #1 on:** April 17, 2022, 04:52:43 pm »

It starts with linear algebra. If you're like me, you don't understand this kind of notation (DLB001.JPG) here page 34. I confess at 44yo (since yesterday) I never took time to learn exactly how to read this summation syntax. Now I know.

So the book explains about Vectors (1-dimension arrays in Julia) Arrays (2-dimensions arrays in Julia) and Tensors (N-dimensions arrays in Julia). All of these are native types.

Code: julia

julia> m1 = [1 2; 1 1; 2 3]
3×2 Matrix{Int64}:
 1  2
 1  1
 2  3

julia> m2 = [1 2 3; 1 1 1]
2×3 Matrix{Int64}:
 1  2  3
 1  1  1

julia> m1 * m2
3×3 Matrix{Int64}:
 3  4  5
 2  3  4
 5  7  9

If m1 is 3x2 then m2 must be 2x3 if I want to multiply them. This page explains matrix multiplication better than the book.

ivan.moony · « **Reply #2 on:** April 17, 2022, 05:36:06 pm »

Happy birthday

Zero · « **Reply #3 on:** April 18, 2022, 11:03:40 am »

This is page 5.

Quote

The quintessential example of a deep learning model is the feedforward deep network or multilayer perceptron (MLP). A multilayer perceptron is just a mathematical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each application of a different mathematical function as providing a new representation of the input.

The idea of learning the right representation for the data provides one perspective on deep learning. Another perspective on deep learning is that depth allows the computer to learn a multi-step computer program. Each layer of the representation can be thought of as the state of the computer’s memory after executing another set of instructions in parallel. Networks with greater depth can execute more instructions in sequence. Sequential instructions offer great power because later instructions can refer back to the results of earlier instructions. According to this view of deep learning, not all of the information in a layer’s activations necessarily encodes factors of variation that explain the input. The representation also stores state information that helps to execute a program that can make sense of the input. This state information could be analogous to a counter or pointer in a traditional computer program. It has nothing to do with the content of the input specifically, but it helps the model to organize its processing.

So this is an idea of perception as a one-way data flow, from low-level representation layers to high level interpretation. But as long as it's one-way only, you get a categorizing parrot zombie. If you want your thing to be alive, it must be two ways. Top-down flow for expectations, beliefs, and choices. Bottom-up flow for sensory signals representation.

Zero · « **Reply #4 on:** April 18, 2022, 11:36:45 am »

Page 82.
The part on gradient-based optimization is very clear, so I leave it as is.

Zero · « **Reply #5 on:** April 18, 2022, 07:44:56 pm »

Page 116.

Quote

Learning theory claims that a machine learning algorithm can generalize well from a finite training set of examples. This seems to contradict some basic principles of logic. Inductive reasoning, or inferring general rules from a limited set of examples, is not logically valid. To logically infer a rule describing every member of a set, one must have information about every member of that set.

In part, machine learning avoids this problem by offering only probabilistic rules, rather than the entirely certain rules used in purely logical reasoning. Machine learning promises to find rules that are probably correct about most members of the set they concern.

Unfortunately, even this does not resolve the entire problem. The no free lunch theorem for machine learning (Wolpert, 1996) states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class.

Fortunately, these results hold only when we average over all possible data generating distributions. If we make assumptions about the kinds of probability distributions we encounter in real-world applications, then we can design learning algorithms that perform well on these distributions.

This means that the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the “real world” that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about.

To me, this part completely discards deep learning as a central backbone for an autonomous entity. These algorithms are important though, as long as we keep in mind their real value.

Deep learning book

Zero

Deep learning book

Zero

Re: Deep learning book

ivan.moony

Re: Deep learning book

Zero

Re: Deep learning book

Zero

Re: Deep learning book

Zero

Re: Deep learning book

Recent Topics

Recent News

Users Online

Articles