I registered for a machine learning certificate program.

AndyGoode · « **on:** November 23, 2019, 01:16:43 am »

Maybe somebody here will be interested in the machine learning certificate program I entered this week...

https://extension.ucsd.edu/courses-and-programs/machine-learning-methods

For three online classes, each at about $700, you get a certificate in Machine Learning Methods, which for me is a pretty good deal. You can even take multiple classes at the same time [if you can afford it, which I can't] and get the certificate faster. This is my attempt to prove to hiring managers everywhere that I actually know something, since otherwise they can't seem to figure that out, so they won't ever hire me to work in my field of AI. It's a heavy cost for me, though, since my current employer won't pay for that education, and I'm pretty poor, without a professional job.

I looked at one online machine learning course at MIT and it was $3,200...

https://www.getsmarter.com/courses/us/mit-machine-learning-online-short-course

I can't even touch that. My guess is this UCSD online course is a relatively good deal, if somebody is looking for such a certificate, or a way of getting into the field of AI.

I can keep sort of an ongoing blog in this thread if anybody is interested, of how that set of courses is going. The four courses available, of which only two of the last three are required, are...

Linear Algebra for Machine Learning
Probability and Statistics for Deep Learning
Practicum for Deep Neural Networks
Deep Learning Using TensorFlow

The first course will likely be too easy for me, since I already have a degree in math, which covered linear algebra extensively, with the exception of tensors. 'Basics of Tensor Flow' is to be covered. I've tried to learn tensors on my own, but maybe the learning material I've seen has been lousy, because I just can't get moving on them. I understand the basics--to get rid of the subscripts for more flexibility--but I've been unable to follow a single theorem so far. I plan to read up on those before the class starts in January 2020. There are several YouTube videos on this topic. However, 'TensorFlow' [with no space] is mentioned as another course, so it sounds like a piece of software, maybe not requiring knowledge of real tensors, and maybe that description of the subject matter was misspelled.

Hit me up with questions over the course of next year, if you wonder how things are going, if I stuck it out, or whatever.

AndyGoode · « **Reply #1 on:** November 25, 2019, 09:15:22 am »

Rather than be bored with the linear algebra in the upcoming course, I've launched into understanding linear algebra at a deeper level of abstraction, at a more intuitive level, than I have ever done before. At the level I mean, I believe many professors haven't even thought about the things I'm discovering, or if they have, they have never mentioned them in lectures or books, at least none that I've heard or read. This is actually pretty typical of instructors at all levels, in my experience: they know the details well, but they don't have a thorough, intuitive understanding of it. If you challenge them by asking why they don't mention those types of insights, they typically retort in ego-protective fashion 'It should be obvious to the student!' or 'I expect students to work that out on their own.' Yeah, right. That deep level is where I like to go, since it is from that level you can detect oversights, which will allow you to come up with your own inventions that fix or beat the inventions that are currently being used.

Today I had a nifty little discovery about matrix and vector multiplication that I diagrammed in the attachment. In case you didn't know, linear algebra is all about vectors and matrices. Vectors look like bars that hold numbers, and matrices look like squares that hold numbers. In computer science those bars and boxes are examples of what are called 'data structures,' and there exist all kinds of other common data structures that hold values in different places--linked lists, trees, graphs, and so on.

Vectors and matrices can be combined, especially by addition or multiplication, only if their sizes are compatible. The check that must be made for multiplication is shown with the thin magenta bar in the diagram. My new way of remembering the directions of this check is to remember '- I, I -', pronounced like 'dash eye, eye dash'. You'll note that this little phrase describes the directions of the magenta bars in the diagrams, when read left to right. The dash is the horizontal magenta bar, the 'I' is the vertical magenta bar. The data structures being combined are compatible only if the '- I' have the same length. The resulting data structure has the size of 'I -'. My new phrase to help remember these two pairs are 'These equal, this sequel', which means if the - I lengths are equal then the result ['sequel'] has size I -. I used to use my own phrase for those directions 'must-be, re-sult' but that had the drawback of not clearly showing which directions to use in which data structure. So that's just a nice little memory aid, but then I realized something more important...

With two data structures--vectors and matrices--there exist only four possible ways to combine them [order is important in linear algebra, since in linear algebra AxB usually does not equal BxA]. The most common order shown in math books is with the matrix first, vector second, like Ax, and the matrix is called a transformation of x. For example, the simple formula ax + b = 0 can be written with matrices and vectors like Ax + B = 0, where the capital letters are matrices and the small letters are vectors. However, I have seen the other orders, too, except the impossible one at the end. Until today I didn't even know the impossible case existed! In other words, you can never multiply a vector by a matrix if the vector comes first. If you study the lengths, you can see why not: As you say 'these equal' you start by using a short bar on the vector as you say 'these', then when you draw the same length line inside the array you say 'equal' and immediately realize that can never occur because you can't make the matrix shorter. It works in the other direction, though, with the array first and vector second, only because you can make the vector longer.

That's a small insight, but useful. For example, that will catch you from making an obvious mistake if writing matrix equations with variables in the wrong order. Also, as I mentioned, it's harder to forget which directions on which data structures need to be equal in length. I'm also pretty sure it contains a truth about the geometrical interpretations of vectors and matrices, but I haven't quite worked that out.

If people are interested in these little insights that I find useful, please let me know, and I'll continue this semi-tutorial on linear algebra, otherwise I'll fill out this thread with links to YouTube videos about TensorFlow, my predictions about how these classes will go, and other topics.

LOCKSUIT · « **Reply #2 on:** November 26, 2019, 03:04:12 am »

GPT-2 is just Word2Vector, dog=cat is learnt from big data, except its for sequences! Which sentence is which sentence.

I just want the gold - GPT-2 that thinks with sequence2sequence 'abstacting vectors'.....its a matrix embedding

AndyGoode · « **Reply #3 on:** November 26, 2019, 04:59:34 am »

Quote from: LOCKSUIT on November 26, 2019, 03:04:12 am

GPT-2 that thinks with sequence2sequence 'abstacting vectors'.....its a matrix embedding

If you can rephrase this so I can understand it, I could work on this. Don't forget--I know very little about GPT-2, so when you say
'sequence2sequence' I don't know if you're referring to some software, some variable, some technique, or if you're just abbreviating concepts.

Back to my insights...
Since vectors are simpler than matrices, both of which are fundamental to linear algebra, it would be good to completely understand vectors first. One thing that threw me off in my math education was a math teacher who said that vectors are arrows with length and direction, but are not 'anchored' anywhere. Most students assume that a vector has to have its foot at the origin, but that's simply not true, per the teacher. The teacher's description made sense, since wind can be represented by a vector, where the direction of the arrow represents the direction the wind is blowing, and the length of the arrow represents the strength of the wind, and it doesn't matter exactly where that vector is placed on a map, since everywhere on the map of a region has the same wind vector. However, how does that relate to the representation of a vector like [1, 1], which is a 2D coordinate, which obviously is a location, if location doesn't matter?

The following video answered this confusing matter excellently. The answer is that the arrow representation is how physicists think of a vector, but mathematicians and computer scientists have a somewhat different concept of what a vector is, with the math interpretation being the most general...

Vectors, what even are they? | Essence of linear algebra, chapter 1
Aug 5, 2016
3Blue1Brown

LOCKSUIT · « **Reply #4 on:** November 26, 2019, 07:32:06 am »

I can explain, unlike korrelan (naughty Christmas boy, better be nice). Glove / Word2Vector are the same text algorithm really, they learn how much dog=cat, dog=piano, dog=dog, dog=pet, dog=kibble. Ex. 100%, or 0.015%. Imagine a huge ring with every dictionary word on it, connecting to each other like a web of relational connections, this is a heterarchy. They learn these relations between all words to all words because the same words appear by both, ex. it'll notice "her dogs eat kibble" and "her pets eat kibble" and assumes dogs/pets both eat. They got the same surrounding words. And in fact, Glove/word2vec both too often think dogs=eat because they appear nearby in "her dogs eat kibble". Dogs=kibble, yes. These are patterns in Big Data. That's all Machine Learning is folks. One big experience stash that allows you to solve unseen, wide amounts of problems/questions, by leveraging many analogies from many domains. More diverse data = more you know it all, because each dictionary word explains each other dictionary word in the relation web - small world network, if you learn all the patterns in the Big Data then you learn all domains, all analogies, you almost need no external data. Sequence2sequence is the same algorithm as well, except it thinks a sentence = a sentence ex. "if we go outside then we may die"="but he went in the room and he was shot". GPT-2 first translates its ex. 10-word window on the end of the current story made so far so that it can understand (recognize) the sentence (vector), it must recognize the sequence, it'll clarify what 'it' means, what 'stick' means (put or tree), what 'bonjour' means, what 'jeep' means - it replaces the words in the unseen sentence with words from similar sequences or 'vectors'. Ex. "i bought a car" becomes "i bought a book". If a word in the sentence is 'dog' and 'it' appears as well somewhere then it will be tempted to replace 'it' with 'dog' since Glove thinks so and will be o-k as long as the sequence vector permits it. Part of the score (and for learning sequence relations) is a location sequence vector as well (just like the Turing Tape Machine reads which bit it's on (1 or 0), which state it's at (ex. state4=if bit=1 write 0 move left goto state9), to decide which bit to write, which state to go to, and which way to move on the tape), because the word structure can be any way can't it haha, "if she ran to the store then i will go after her", "after her i will go then if she really really did run to that store". It goes through 6 layers to do this, the Decoder of the Transformer Architecture. It then does this process again for deciding which word to add to the end of its window at end of story. The word it chooses for translation and entailment stages is based on seeing it entail a similar phrase, frequently, and relation to other words in the story but not frequent English words like 'the'. That's GPT-2. When it learns its Glove / Sequence matrix vectors, it learns many sizes actually, a, b, c, d, ad, ab, ism, ing, stand, standing, standings, the dog ran, the dog ran to the girl. It uses Byte Pair Encoding to learn the 'real' parts of the data by lowering Cost to Segment the data correctly, based on frequency relation etc. It'll learn 'ing' is re-usable and also makes new words/phrases up. It'll learn standing is re-usable as well. These are its vocab 'words'. Then it'll learn the relations between all these 'words' as I explained. The Glove I used in my first GPT-2 took a year to train by some Author. Sounds similar to GPT-2!

Hahaha, HAHAHAHAHAHA... I literally just re-generated data I was missing.

Now if you guys are really on your ears you'll all realize I hit the jackpot finally all on my own. And will code it for me! :D If anyone has a question, shoot.

goaty · « **Reply #5 on:** November 26, 2019, 07:49:18 am »

I think you don't have to delve really deep to find something really profound to you in maths, and the less formal education you have the simpler it is what you find, but it can still be quite useful to you. Starting from vectors and matrices, if you matrix multiply its the same thing as the dot product, and it means so many things, (for example two things I know is it can be the distance of a point to a plane as well as the angular difference between 2 vectors!!! and I bet it means more too) and us 3d game makers got spoon fed it because we all needed it so badly to do basic things. The guy that invented the dot and cross product, are wonderful uses of linear algebra, and if you found something more yourself it would be cool, and it heads off into the wild and secretive west of quantum computing.So good luck AndyGoode, see if you find something amazing for yourself to put on your chest of drawers, and maybe even deadly.

LOCKSUIT · « **Reply #6 on:** November 26, 2019, 03:25:03 pm »

This is just Glove 300 dimensional space.

AndyGoode · « **Reply #7 on:** November 26, 2019, 10:17:10 pm »

Quote from: goaty on November 26, 2019, 07:49:18 am

So good luck AndyGoode, see if you find something amazing for yourself to put on your chest of drawers, and maybe even deadly.

Thanks, goaty. I probably should have clarified what I mean by 'insight' in this mathematical context. What I mean is that I am starting to fill in relationships and geometrical meanings that I never learned before, back when I was studying linear algebra in school, and that I suspect most students never learned before, either. Therefore when I called this thread a 'semi-tutorial' I meant that I want only to fill in a few gaps that the readers probably never realized before, small insights that help basic understanding, but aren't intended to be some great mathematical breakthrough, or a complete tutorial on linear algebra that can be gotten elsewhere. I'm hoping that other people who take one of these classes will get a slight advantage from these minor insights. If nothing else, maybe some linear algebra knowledge will rub off on people reading this who didn't even intend to learn it. Math has been around too many thousands of years for me to believe I can add anything new to it, except maybe in an applied or indirect way.

AndyGoode · « **Reply #8 on:** November 27, 2019, 12:23:13 am »

Quote from: LOCKSUIT on November 26, 2019, 07:32:06 am

Part of the score (and for learning sequence relations) is a location sequence vector as well (just like the Turing Tape Machine reads which bit it's on (1 or 0), which state it's at (ex. state4=if bit=1 write 0 move left goto state9), to decide which bit to write, which state to go to, and which way to move on the tape), because the word structure can be any way can't it haha, "if she ran to the store then i will go after her", "after her i will go then if she really really did run to that store". It goes through 6 layers to do this, the Decoder of the Transformer Architecture. It then does this process again for deciding which word to add to the end of its window at end of story.

Nice explanation. Now I understand why you were looking a Turing machines. Although Turing machines were intended for a much different purpose--to prove theorems about computability--they do many actions with a single instruction, which sounds similar to GPT-2, Turing machines are sort of like CISC computers...

https://en.wikipedia.org/wiki/Complex_instruction_set_computer

...in that way.

Anyway, can you draw a picture of a 'location sequence vector'? The following is how I originally envisioned what you were saying, but I realize now I was probably wrong... I thought GPT-2 was trying to guess the most likely direction in which to generate words by its stored statistical history, and that the location sequence vector stored possible directions for the sentence to go, and that it selected the direction that had the highest probability value. Whether or not this is what GPT-2 actually does, it seems like a reasonable approach, though I can come up with better approaches. Visually, it sounds sort of like what Kalman filtering does.

Back to my predictions about this first class I will take...
(1) I predict I will have problems with the first instructor because she's a woman. I have real trouble dealing with courses set up by women, and for some reason women don't like me, either. Somewhere there's a serious logic gap in thinking styles, I suppose. However, hopefully whatever sets women off about me will be sufficiently disguised by the more objective online class setting. I can only guess at why the above are true, but my experiences have been very consistent in that regard.
(2) I predict I will come up with at least one great idea by the end of the last/third class. I do that a lot, once I start to understand the material. I'm already bordering on a likely great idea in Deep Learning, even though I don't know even how Deep Learning all fits together yet. [One discovery I read about, from Deep Learning, fired my imagination to use another idea on which I'd been working, to use with Deep Learning.] If I do come up with such ideas I'll likely mention them here, write an article about them, and after posting or publishing the article, I'll discuss it with anyone interested.

goaty · « **Reply #9 on:** November 27, 2019, 12:52:15 am »

Quote from: AndyGoode on November 26, 2019, 10:17:10 pm

Math has been around too many thousands of years for me to believe I can add anything new to it, except maybe in an applied or indirect way.

Theres gaps I think. Let me be arrogantum.
I thought you said you hated your useless high school teachers - theres more where they come from, p=np not solved for 100 years - world war 2 lost to the turks? I think the generations before us must have had mental problems, and our generation is going to put it all into overdrive.

Don't think I think that makes sense also as I believe it true, how the hell did we have such fuckhead ancestors, and us not be completely incompetent with them?

But its not just mental visionary altheticy involved, takes a full measure of guts and or stupidity as well to stay sane during the process!

[edit]
But that's not what I think... less arrogantly ill say things just don't make sense, and history is false a lot, and who knows what actually happened during the war, cause its got to be overdrive with miracles when all the shits going down.

AndyGoode · « **Reply #10 on:** November 27, 2019, 01:08:25 am »

Quote from: goaty on November 27, 2019, 12:52:15 am

Theres gaps I think.

What we need are small, manageable gaps, not big gaps like the Riemann hypothesis, the 3-body problem, or proof that Euler's constant is irrational.

https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics

That's partly why AGI is such an incredible draw for me--I love math and chess--but I've reached the point where to make progress I can clearly see I need an intelligent machine or computer program that deals with abstract concepts instead of numbers. That's partly what I meant by 'indirectly' making progress--if I can design a machine that will do the operations I want, then that machine will make the automated discoveries for me. Then my chess will get better, and I can answer some math questions that are tantalizing me (and the rest of the mathematical population).

I didn't hate all those misleading math teachers I had (except one, who was a professor only as long as he could get his stupid daughter to pass his own math classes at a university, then he quit the same year she graduated); I just think they could have been better at teaching and explanations.

goaty · « **Reply #11 on:** November 27, 2019, 01:15:28 am »

Yes, feeling mislead, I feel the same. I feel like the world isn't trying, and we are just going to our rotting deaths cause none of the idiots warned us early enough about what its like being in "*hell* o world"

And I can just hear them now - " oh you should have thought of that yourself. "

me-> "thanks a lot."

AndyGoode · « **Reply #12 on:** November 28, 2019, 09:13:43 pm »

Next minor, fill-in, mathematical insight...

I think what threw me off on this topic at an early age was that when I became obsessed with higher math in high school, I began looking at more advanced math books from the school library, but I didn't read all the explanatory and/or historical material that supported that math. When I came across the topic of matrices, which was never taught in my high school, I thought they were a cool idea. Basically matrices are a type of data structure, which programmers learn are basically empty slots arranged in different shapes, where you can fill in values in those slots. In the case of matrices, the structures are in the form of rectangles with rows of cells in them. However, I hadn't yet taken any programming classes, either. In any case, I thought matrices were a cool idea, since instead of just moving around numbers one at a time, you can move entire arrays of them at a time, and do useful things with them that way, basically parallel operations. All told, my perception of matrices was that they were just somebody's random idea that it would be cool to put numbers into a regular structure, then to find out if that structure had any useful mathematical properties, which it did, as is often the case. As a result, I never asked myself what a matrix really is.

I got more thrown off when I took math classes in college, and they taught us the method of Gaussian elimination, with the slightly extended version of that method called Gauss-Jordan elimination. Those methods are extremely useful in practice, easy to understand, and easy to program on a computer. However, those methods are atypical of matrix operations because they move the *rows* of a matrix around, not the columns. That gave me the erroneous impression that the essence of a matrix was a collection of stacked rows. You can see such row manipulations of the matrix in this video...

Algebra 55 - Gauss-Jordan Elimination
Apr 16, 2016
MyWhyU

It wasn't until last week that I began to rethink this. A clue is in the above video--note that they extend the idea of the matrix by adding a *column* on the right-hand side of it, not a row. They call the result an 'augmented matrix.' This suggests that matrices can be extended naturally by adding columns. After seeing the following video I realized that a matrix is more typically a collection of *columns*, not rows...

Linear transformations and matrices | Essence of linear algebra, chapter 3
Aug 7, 2016
3Blue1Brown

That's a great video for many reasons, in my opinion. Note that they start out by showing a geometrical interpretation of a matrix, which is a transformation of vectors to vectors in 2D space, which is something else I wanted to know, then they show how those vectors [shown as *columns*, not rows], naturally fit together side-by-side in a compact form that is a matrix. Suddenly it becomes clear why vectors are formally always shown as columns in math books, not as rows. (Computer science books are the opposite, since it's easier to write text horizontally in a book and in a computer program, and in programming you don't necessarily use matrices in a mathematical way, which was emphasized in the first video I posted in a post above.) Suddenly it also becomes clear why the augmented matrix of Gaussian elimination adds a *column*, not a row, also shown in the above video.

The main takeaway from all this: it is far more intuitive to think of matrices as collections of columns, not as collections of rows, especially when considering their geometrical analogies.

That's not to say it is *always* the case that only columns are added instead of rows. One notable exception, also related to geometrical operations, is the augmented matrix for affine transformations that adds a *row* at the bottom of the matrix to make the math simpler...

https://en.wikipedia.org/wiki/Affine_transformation

However, such exceptions seem to me to be rare. In short, I'm going to start switching my perception of arrays from collections of rows to collections of columns, which will ease my understanding of how they relate to geometry.

Additional tip: Avoid sounding ignorant by pronouncing 'affine' correctly. It is pronounced like 'ah-FINE', like 'That's ah FINE transformation y'all got there.' (https://en.wiktionary.org/wiki/affine) I once pronounced it as 'AFF-fine' to a more advanced math student in college and felt like a dummy when he corrected me. That's a common problem with forging ahead on your own using only books. I made the same mistake in high school when I started talking to my classmates about lasers and 'ZEH-nun' flash tubes, whereupon they laughed and said 'You mean ZEE-non?' Now in this day and age we can learn pronunciations automatically from YouTube videos, assuming that we're not too lazy to watch technical videos, that the videotaped teachers are native English speakers without foreign accents, and that we aren't watching ignorant people talking about religion and politics with pronunciations like 'SPEE-seez' and 'eek-o-NOM-iks'.

goaty · « **Reply #13 on:** November 28, 2019, 10:51:51 pm »

That's cool. Im actually hopeless at matrices and I avoid using them when I do my algebra, its very bad practice. Matrices make everything a lot neater, everything comes down to a sum of products, they are great.

HS · « **Reply #14 on:** November 29, 2019, 09:09:37 am »

I thought I'd try to make some independent discoveries as well, so I applied my symmetry idea from the "Alternatives to Logic" thread, to some math I'm familiar with, and lo and behold, I think it worked. It's faster than the methods I was taught to use. This inventing your own methods idea is great fun.

As opposed to all this quadratic stuff...

I registered for a machine learning certificate program.

AndyGoode

I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

LOCKSUIT

Re: I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

LOCKSUIT

Re: I registered for a machine learning certificate program.

goaty

Re: I registered for a machine learning certificate program.

LOCKSUIT

Re: I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

goaty

Re: I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

goaty

Re: I registered for a machine learning certificate program.

AndyGoode

Re: I registered for a machine learning certificate program.

goaty

Re: I registered for a machine learning certificate program.

HS

Re: I registered for a machine learning certificate program.

Recent Topics

Recent News

Users Online

Articles