Questions about CNNs

LOCKSUIT · « **on:** October 11, 2016, 05:14:51 am »

I want to know some specific things about Convolutional Neural Networks that i cant seem to find out online.

All the images on Google show a stack of images on the far left as many of the SAME image, why? Like the first stack is 20 images of almost the same picture one behind the other. I get that a local window scans the input, but, why is the input saved as 50 slightly variations stacked behind eachother? Then the rest of the diagram shows convo pool convo pool WITH this stack of images and not just being done to 1......Are these *assorted images from *all of its life and is why they look similar? Man, if so, don't show *such similar images, I'd show a hat and a pyramid and then a triangle.

Next, when the scanning happens of input, does the small window link the window to a neuron and hold the window feature also connected to the node?

Lastly I'm not getting how it works....I get that the input quickly searches just a feature and even hashes and hits the output out fast through interconnections to the answer but I'm not understanding it bro...............Like input enters, and then has a window scan it 4,000 times WOW that takes long here, then, continueing its search to the exit, it then, . . . it then doesn't end up following its own line to the exit, it, it finds a feature that, um, goes to the next images of windows, then, ya but this is only 1 small window wt#...........

And how is it finding higher-level-feautures when it is SHRINKINGGGGGGGGGGGGGGGGGGGGGGGGGGGG !

Explain to me using the image below how - when i stare at something in my room - searches this thing?:
https://ujwlkarn.files.wordpress.com/2016/08/conv_all.png?w=748
My guess is input enters the back actually/front, then it picks one of the stored images from its life say the 8 on the right, then, the way it picks the right image i.e. bed yes bed not even a 8 only 8s were shown is it will link on those angle links over to the next hidden layer even far away and all the small window matches select this path to the exit. At exit is far linked actions+senses from a moment in its life.

Korrelan · « **Reply #1 on:** October 11, 2016, 11:29:46 am »

There are literally hundreds of different techniques and methodologies used by various systems to achieve varying degrees of accuracy in image recognition. This is a very brief explanation, this is a extremely involved and complex topic to cover in a short description.

The word â€˜convolutionâ€™ (in this context) basically means â€˜filterâ€™.

There are usually two main stages to an image recognition convolutional network; the first applies various â€˜convolutionalâ€™ filters to the image, these are designed to enhance or extract the main features, angles, lines, gradients, shadows, etc.

One convolutional filter might highlight all the lines in the source image. The system has a pre-stored/ learned collection of smaller images (feature maps) that show small lines at various degrees of rotation/ angles. The original image is the scanned/ searched with each of the smaller images (lines) and matches are noted with their relevant positions. A single feature map might be a horizontal line segment, so all the locations in the original image (with applied convolution filer to highlight lines) where a horizontal line is found are recorded.

The same is done for colour boundaries, gradients, etc.

The resulting positions of the recognised smaller feature maps are then fed into the neural net for learning or recognition as complete object feature. The neural net learns which collections of the smaller feature maps and their relative locations â€˜too each otherâ€™ make up an object feature; and which features make up an object.

In your example of the figure 8 the bottom row shows the original image after it has been run through the various convolutional filters. The pooling layers are the output results of running the convolved images against a neural network trained to generalise the features found in the convolution layer. Once past pooling layer 2 the line represents the output of the recognition network.

Quote

All the images on Google show a stack of images on the far left as many of the SAME image, why?

This is the stack of images with various convolution filters applied.

Quote

I get that a local window scans the input, but, why is the input saved as 50 slightly variations stacked behind eachother?

This stack represents the small feature maps that are scanned/ searched for within the convolved images.

Quote

Are these *assorted images from *all of its life

These images are the results of a generalizing neural net, producing images that represent the collections of feature maps found and their relative positions to each other.

The basic idea is an image is broken down into its basic parts, the parts are recognised from a set of pre defined templates and then the neural net guesses whats in the image based on the bits it recognises. lol.

If it looks like a duck... walks like a duck... quacks like a duck... then chances are it's a duck.

Just for fun... an example, I will use you as the neural net in a convolutional network.

Iâ€™ve drawn a red shape (triangle, square or circle), and searched the image with thousands of feature mapsâ€¦ this is the output resultâ€¦ these were found in the imageâ€¦

What shape is it?

kei10 · « **Reply #2 on:** October 11, 2016, 11:48:16 am »

Well written, korrelan!

I first, too, had trouble understanding the convulation neural network. But just as korrelan written, the convulation is to match against a bunch of filters before being fed further into the system to be computed for matching, this is perhaps to increase the accuracy, and it can be slow if too much filters are taken place.

The pooling is the purpose of down sampling feature maps into small scale with the purpose of getting only the most important features, the often used is a "max pooling", which means the features with highest value within the matrix are retained instead.

With the only important features, it is then fed into the neural network to be trained using back-propagation by adjusting the weights of each features, which they are linked to the output prediction layer.

Prediction occurs by finding out how many features are found matched from an input, which sums up the total number of weights linked throughout the Fully Connected Layer. The one with highest scores of total weight to the prediction layer is the predicted answer.

While the CNN usually involves image recognition, it can be explained like this; Take the example of two words; Cake and Lie. These two words contains different amount of distinct letters; C, a, k, and e for Cake. L, i, and e for Lie. CNN breaks down the input word into letters to be counted on which has the highest following matching letters (features). Except CNN involves convulation filters, which takes extra step into breaking down the words in letters in a few other ways to increase accuracy.

Edit: Although simple string recognition can be computed using Levenshtein distance, as it almost bears the same similarity with CNN from the way it works.

https://en.wikipedia.org/wiki/Levenshtein_distance

LOCKSUIT · « **Reply #3 on:** October 11, 2016, 10:22:55 pm »

..........

Take a look at this picture:
http://scarlet.stanford.edu/teach/index.php/File:Mylenet.png

1) Where did the greyscale picture on the left come from?
Is it:
Input about to search?
Or an already saved image to BE searched?

2) After the greyscale image there is a lineup/stack of images.
Are they the input split into 8 same/different copies?
Or are they already saved images that have been put with eachother if they look related?

3) Why does the stack count, in stack 3, increase from 4 to 6? Now there's 6 images lined up.

4) Does stack 1 hold faces or lines? If lines, then why is stack 1 so big compared to stack 4?

5) If the input image has to end up choosing one of the dots at the end, then when does the input image divert from one of the beginning lines? At what times, and why? (each stack has images lined up but input can only be following one of the images in a stack at a time).

6) Stack 2 shows that all of the images in stack 2 are linked to just one of the images in stack 3, yes or no? What is that?

kei10 · « **Reply #4 on:** October 11, 2016, 10:53:17 pm »

1) Depends on the machine-state of the algorithm, it can be an input to train, or search. The machine-state can be as simple as a switch or a flag to tell the algorithm whether to read or write.

2) They're filtered image data, there are 4 images within the stack 1, which means 4 filters are used.

3) Convulation and sub/down-sampling can be applied numerous times for increased accuracy -- I believe. Noticed that there are two convulation layers and sub-sampling from the illustration, which named C1, S1, and C2, S2.

4) Because by the time it has gone to stack 4, it has been convulated and down-sampled twice as illustrated, which it shrinks into smaller feature maps.

5) The feature maps began to be fed into the neural network at the Fully Connected layer (at the end of S2 sub-sampling). That's where the magic happens, and the one of the dots in the end computed the highest threshold fed by the feature maps are selected.
The Fully Connected Layer is no longer image matrix data, they're neural network layers, and the dots at the end are the datasets.

6) I'll leave this to korrelan. Although my believe is that it probably used some sort of algorithm that ends up returning only 6 feature maps, from the 4 inputs within the S2 convulation step, or it's just a simple illustration error.

LOCKSUIT · « **Reply #5 on:** October 11, 2016, 11:28:13 pm »

Also korrelan, how can input chose at the finish line if it dittent go through ALL images to compute feature winner? That's why I'm thinking the input only choses ONE stored image and then divert to one of the finish balls. CPU could only handle it anyhow. As for no diversion, then it would never select anything else at the end, meaning the stack HAS to be all of the stored images it has seen during its life, not all of the input. Very important you read over this.

keghn · « **Reply #6 on:** October 12, 2016, 01:32:25 am »

One big CNN can detect 1000s of different things, Like a 1000 different animals.
CNN are trained on a million or more images containing the animals.
A cnn has a many thousands pixels for input and a thousand pixels for output. One pixel turns on for each animal.

A Week to train a very big CNN. Once trained It take a fraction of a second to detect a animal.

kei10 · « **Reply #7 on:** October 12, 2016, 01:47:23 am »

The convuled and subsampled images are not stored, so there are no choosing at the beginning to the end, all it does is just breaking down the input into important data which is called feature maps to be fed to the neural network.

Only the Fully Connected Layer (neural network nodes, synapses, and weights) has to be stored within memory storage. Think as one's brain plasticity.

The neural network memory usage itself is already big enough.
http://programmers.stackexchange.com/questions/324510/why-do-convolutional-neural-networks-use-so-much-memory

LOCKSUIT · « **Reply #8 on:** October 12, 2016, 03:27:30 am »

But what I mean is when an image enters as input into the CNN, there is 8 possible choices at the end, and remember our old saying "you can't try every image in memory, it would take months to recognise your mom". So a approach must be taken. I'm asking yous - when a image enters as input - does every filter or "every stored image" get tested onto the input image and convolve down it? How else can the points be added up if neither of these "ways" is done? Way1>pass through all images and add up tokens. Way2>Don't=prance through a organized treebranch i.e. divert over left 3 or 88 images & repeat unto the end selection.

Korrelan · « **Reply #9 on:** October 12, 2016, 09:27:08 am »

No two images received through your eyes or a video camera are ever the same; so itâ€™s impossible to store all the images you or a machine has ever seen and then compare them against an incoming image stream. The CNN applies filters to the input image to highlight features that can easily be matched against the millions of very small images it stores in memory. The small images in memory where cut out of the training images because they were relevant to the CNN recognising a particular object/ scene.

Using the human visual system as an analogyâ€¦

When you read a written sentence you view each word in turn, and match each letter against a stored representation of the letters in your memory. You look at each letter and then scan through all your stored images of letters until you find a close match (of just that letter). Once you have matched all the letters in the word and noted their relevant positions to each other; your neural net can produce the meaning of the word based on the pattern of letter images selected.

The convolutional network is based on how they thought the mammalian visual cortex works.

A simplified CNNâ€¦

The input image is not stored permanently by the system during recognition.

The â€˜convolutionâ€™ part of the CNN just applies a filter to the input image. These images are not stored permanently by the system during recognition. The filter reduces the colour depth and enhances features, making matching faster and simpler. (First stack)

The millions of small stored feature maps that represent angles, gradients, etc were all stored after the training input image had been altered by a convolution filter, to highlight its features, so they already have a convolution applied.

These small feature maps where copied from the original input image and stored during training because they where relevant to the system recognising just that single individual input image.

The basic two steps here can be repeated and extra stacks/ functions (max pooling, etc) added between that further simplify/ shrink the convolved images, further enhancing the details and speeding up processing.

The last bit is the Neural Net; this learns/ recognises what combinations of the small feature maps best represent the input image, based on which of the feature maps were found in the convolved versions of the input image.

The number of final outputs depends on the number of objects the system has been trained to recognise.

This is a very complex topic, there are many different systems that use different filters, parameters, levels of pooling, etc. Some do use a hierarchical system to limit the numbers of feature map searches required based on the recognition progress.

keghn · « **Reply #10 on:** October 12, 2016, 03:39:38 pm »

@BF there are eight output because designed to detect eight things, A small network. Do you have a link to this small
CNN you describing?

The input To CNN use a tick of using a very small scanning NN kernel of around 10 by 10 pixels instead 800 x 480:

http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

There are many different scanning NN or as in the article say sliding window functions, for the first layer.
There is a sliding window function, filter to detect lines, or edges of vertex, diagonal lines, left and right and up and down edge lines
and patches of shades and may be colors.
These small sliding window function are very small NN so the can hold only a few variation of the feature the a detecting.
Detection on this level are past up to the next level or layer and are put together by the next layer to form the final
detections:

http://www.nature.com/news/can-we-open-the-black-box-of-ai-1.20731

Aso, the scanning window get larger and detect bigger feature of vertex and other types of lines and patches of
shade.

LOCKSUIT · « **Reply #11 on:** October 12, 2016, 09:18:07 pm »

Ok I'm done understanding it. Billions of images are stored as full which can be remembered so yes fully stored too, and have been shrunken into multiplication numbers and those to their own too in the next stacks, while any input can then pass all and/or do the divert thing to select the final thing at the end i.e. girl>+reward or memory>mountain.

Korrelan · « **Reply #12 on:** October 12, 2016, 11:36:12 pm »

Quote

Billions of images are stored as full

Ermâ€¦ No.

Quote

so yes fully stored too

Ermâ€¦ No.

Quote

and have been shrunken into multiplication numbers

Ermâ€¦ No.

Quote

while any input can then pass all and/or do the divert thing

Ermâ€¦ close butâ€¦ No.

Quote

to select the final thing at the end i.e. girl>+reward or memory>mountain.

Ermâ€¦ Noâ€¦ did you read any of the above? lol.

It's about the journey...

kei10 · « **Reply #13 on:** October 12, 2016, 11:48:27 pm »

@korrelan
It baffles me Everytime about whether if Locksuit could read or not.

Either way, I have a question regarding neural network, I believe one of this caused me to unable to understand neural network. Just to confirm this
...

In neural network, how do they store memory? Like for example, two distinct data, data X and data Y is given to train the network. Feeding Data X will cause the network to output V, and data Y outputs W.

My question is, data X has trained the neural neteork's hidden layer's weight to output V.

Wouldn't that training using data Y will overwrite the weights to output W?

Or neural network works in such that trained by multiple data will adjust its weight to the extend that it can remember, and capable of output V or W?

If so, how does that work?

Thanks!

LOCKSUIT · « **Reply #14 on:** October 13, 2016, 02:55:59 am »

A full size colored image of my room/mom is stored in my brain. Each at the end of the CNN.

The convolution is not trulllllly a image but actually a smaller size image of multiplications of the pixels in the filter window.

At the end of all the images's feature maps is the NN that has many feature maps connected to each soma and the more maps energized and the more next layer energized the more one of the end soma fire and is birdy.

2 features hit and the dendrites strengthen.

2 feature maps hit and the dendrites strengthen.

Questions about CNNs

LOCKSUIT

Questions about CNNs

Korrelan

Re: Questions about CNNs

kei10

Re: Questions about CNNs

LOCKSUIT

Re: Questions about CNNs

kei10

Re: Questions about CNNs

LOCKSUIT

Re: Questions about CNNs

keghn

Re: Questions about CNNs

kei10

Re: Questions about CNNs

LOCKSUIT

Re: Questions about CNNs

Korrelan

Re: Questions about CNNs

keghn

Re: Questions about CNNs

LOCKSUIT

Re: Questions about CNNs

Korrelan

Re: Questions about CNNs

kei10

Re: Questions about CNNs

LOCKSUIT

Re: Questions about CNNs

Recent Topics

Recent News

Users Online

Articles