Ai Dreams Forum

AI Dreams => New Users Please Post Here => Topic started by: Ben.F.Rayfield on February 13, 2014, 02:11:00 am

Title: Ben F Rayfield - My bizarre research toward networking minds together
Post by: Ben.F.Rayfield on February 13, 2014, 02:11:00 am: My research is open source (GNU GPL at least, for each part) but is too early to have impressive prototypes, which is because I'm going for something that will change the world. Some parts are at http://sourceforge.net/users/benrayfield (http://sourceforge.net/users/benrayfield) especially BayesianCortex, (realtime flowing bayesian network), Physicsmata (various vibrating kinds of math I've found and distance constraints), Audivolv (evolves musical instruments you play with the mouse as blocks of code and permutating their vars), NatLangMouse (where I data mined Wikipedia), JSoundCard (access to speakers and microphone to define custom interactive audio effects), GigaLineCompile (combines a compiler and interpreter for realtime optimization of code AI evolves), but mostly I'm looking for something I've not been able to put into code yet, something related to bayesian networks, boltzmann machines, fourier, pascals triangle, bell curves, hyperspheres, neuromodulation, and networking minds together through the Internet.

What I'm going for is hard to explain. It started with single celled life grouping together and gradually reacting more like brain cells, signaling eachother. As many celled life, we are still grouping together, forming into businesses, countries, and decentralized structures like open source and social networking. Brains are decentralized, and we are becoming more like a big brain.

Why are wikis only for text and linking pictures? A wiki is a shared data space.

Wikipedia is the main wiki for text. No sentence is owned by any one person. Anyone can change a word or any part. And it works. It converges toward what many millions of people think and has become a useful set of the worlds most unbiased knowledge about a variety of subjects.

But why only text? Are we incapable of painting a picture together? How would the versioning work if somebody wanted to undo somebody elses change, like if they vandalized it by drawing spam? Its not nearly as easy as text diff. But its what we need to do next to expand the Internet, a wiki, a shared editing space, for more than text.

In recent years there have been major advances in object recognition by AI, and its opposite which is imagining objects from words or a partial visual pattern. If we can build a visual interactive representation of thoughts and fit them together like puzzle pieces, this kind of wiki would result.

It should be as flowing and intuitive as possible. A text wiki has everything at a specific version number, but a picture wiki is continuous somehow. The basic building block of the graphics are bellcurve voxels in the way bellcurves are made of other bellcurves. At stdDev 0 its simply a pixel. As stdDev increases you get bigger 2d blobs. The color interface is transparency, red, green, and blue, which can be calculated as 32 bit ints (8 bits each) or array of 4 floats. These are the building blocks that can form any image, and with the right AI math they will form interesting patterns guided by user input.

What I'm going for is based on Pascals Triangle as the derivation of bell curves, circles, and their higher dimensional forms. If you repeatedly flip 200 coins in 2 groups of 100, each time line up the heads on east and tails on west, and in the other group heads on north and tails on south, and line them up inside the same square thats 100 coins on a side, and make the 2 lines of coins meet at their heads/tails crossing, then each time you flip the 200 coins it will choose a place usually near the center of the square, and statistically it will form a 2d bell curve which has a circle of near constant density at each radius. This square forms circles. Why do adding and subtracting, which look like the flattest things there are, form circles? Pascals Triangle.

...BAYESIAN NETWORK...

Bayes Rule is the definition of conditional probability:
chance(X,given Y)*chance(Y) = chance(Y,given X)*chance(X)

Bayes Rule is used simply as arrays of chance numbers where each array is size a power of 2 (number of bayes vars) and the array sums to 1 for total chance. Inference is done by setting the chance of some vars then reading the chance of other vars. To change the chance of a var, you adjust the total chance of half the numbers in the array (selected with a bit mask with a 1 in that var index). You simply hold the proportions of numbers in each array half constant while changing the totals for each half to the chance that bayes var is true and the other half is the chance its false.

I started on bayesian networks, but then I found boltzmann machines, which are bidirectional neural nets, scale better and are still good at pattern matching.

...START: BOLTZMANN...

Boltzmann Machine is a kind of statistical AI that works this way in each 2 adjacent layers (Visible and Hidden).

Each node is a bit. Each edge between 2 nodes is a scalar.

Start with random edges between all pairs of nodes, 1 in each pair of layers.

It learns bit vectors, 1 bit for each node, by example, and rebuilds it when you give it a partial pattern. I have it learning some simple ellipses in a window and when I paint with the mouse it changes gradually to other ellipses in the training data, whichever literal training data (no movement, scaling, or turning, just parroting) it was closest to.

Repeat training of all data while simulated temperature decreases. Temperature is from an accurate analogy between boltzmann machine statistics and real physics, the metal shaping strategy called annealing.

Each temperature cycle, train again for all example bit vectors.

Set Visible node bits to the example data.

Set Hidden node bits statistically as weighted random observations of the Visible nodes, since edges only exist between layers but not in the same layer.

Learn positively from all node states.

Set Visible again.

Set Hidden again.

Learn negatively, because we want it to learn the data and unlearn its current behaviors, or if they're the same this approximately cancels out the positive learning and nothing changes.

Update edge weights based on those 2 learning cycles, 1 positive and 1 negative. Edge weights increase by the learning rate when both nodes are on. This very important property of Boltzmann Machines means they can calculate NAND, a minimalist Turing Complete math operator.

When Setting a node statistically, running the layer its in, you calculate a chance then set its bit to a weighted random observation of that chance.

There is an optimization where you can do this in an average of 2 random bits (consume random bits until you get the first 1 then go directly to that digit in the binary fraction and return it), but it does the same as calculating a random fraction and setting the bit to if that fraction is less than the chance.

sum(x) = For all nodes other than x, if they are on, sum the weights to those nodes.

The chance of node x is sigmoid(sum(x)/temperature).

sigmoid(z) = 1/(1+e^-z), ranges 0 to 1, and increases exactly when z increases.

Do all these steps and your Restricted Boltzmann Machine will learn any bit vectors as training data and rebuild the closest pattern when it sees a partial pattern.

Run the network, Hidden, Visible, Hidden, Visible... continuously for an interactive system.

Add more layers for more abstract patterns, but it is still too literal in only parrotting back variations and combinations of training data, no movement, scaling, or rotations. There is an important part of intelligence which is how we think of 2 pencils while only having 1 model of that pencil in our minds, so the same information moves on 2 paths from some point where they connect. I don't see that in Boltzmann or anywhere else except in very basic forms, and AI will not significantly advance without it. But what Boltzmann does is very advanced, and I've explained how its basic parts work, which can be assembled into more advanced things.

The important question...

http://en.wikipedia.org/wiki/Boltzmann_machine (http://en.wikipedia.org/wiki/Boltzmann_machine)

Although learning is impractical in general Boltzmann machines, it can be made quite efficient in an architecture called the "restricted Boltzmann machine" or "RBM" which does not allow intralayer connections between hidden units.

I've verified that. The same learning strategy does not work on a fully connected network shape. The various learnings don't align with eachother as well. They can take different length paths to the same node. It looks very random.

I imagine maybe we could simulate layers by putting nodes at points on a circle or hypersphere and only connecting them to near opposite points, so it could drift between layers slowly, maybe get some stretching and rotation working.

You can start with the SimpleRBM software, or I could give you this early version of my code.

Is there a way to make it work in a fully connected network shape (not Restricted)?

...END: BOLTZMANN...

I also got into fourier math because of one very important fact: The fourier of a bell curve is another bell curve, but not just a duplication. Its a kind of pattern matching. The fourier of a bell curve (by magnitude) anywhere in the input loop is a bell curve at the same specific place (with varying phase), the same place every time regardless of location in input loop. I think this is a common operation to many parts of brains. There is a fractal property to thoughts that allows us to zoom in and out and combine them many ways. Bell curves are the simplest fractal. Coin flips form bell curves which form circles/spheres/hyperspheres which form waves which thoughts and physics are made of.

I'm looking for something to unify it all in the style of Occams Razor, the simplest explation or theory or system design being preferred if it performs most or all of the important functions. I dont think the world is nearly as complex as most people think. They've given up on ever understanding how it all fits together so they let others think for them. I continue searching for the theory of everything starting in the context of statistics and wave based AI. The answer is out there.

Does anyone want to help? Or what do you think?