Efficient vectors ....

  • 8 Replies
  • 1308 Views
*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *******************
  • Prometheus
  • *
  • 4659
  • First it wiggles, then it is rewarded.
    • Main Project Thread
Efficient vectors ....
« on: September 14, 2019, 05:00:55 am »
Ok. I got a question. Anyone with a ANN project can answer. Artificial NNs usually have say what, a million nodes/connections say. I know ANNs use vectors.

I'm going to give a possible answer: all nodes are know as the same, you just spawn 500,000,000 of them. Then, the weights in between must be stored in vectors. They begin randomly initialized. Trained. The cost of storage for each connection is actually higher because there is more connections that nodes! However, they still are less!! This must be vector storage! A vector is therefore this > 0.4636 0.68879 0.35345 ..... therefore each connection is ~8 bytes, not a KB!

?
« Last Edit: September 14, 2019, 05:25:55 am by LOCKSUIT »
Emergent          https://openai.com/blog/

*

HS

  • Trusty Member
  • **********
  • Millennium Man
  • *
  • 1175
Re: Efficient vectors ....
« Reply #1 on: September 14, 2019, 09:36:50 am »
Don't you need extra memory to cycle the weights? That way you get a multiple cylinder engine under your hood. The neurons would be driving the GTP just as much as the environment is driving the neurons. Otherwise, with cars you'd get stuck on hills and with AI in behavioral patterns like reapeatedly walking into walls.

*

Korrelan

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1454
  • Look into my eyes! WOAH!
    • YouTube
Re: Efficient vectors ....
« Reply #2 on: September 14, 2019, 10:16:16 am »
In a conventional ANN, CNN the position of the neurons is not relevant, only the connection matrix.

The neurons are stored as a indexed array along with the thresholds, bias, sigmoid vars, etc, say 1 to 1000.

The connections matrix defines the structure and is also stored in a indexed array. The connection array stores the indexes for the (from, too) nodes, weights,etc. It's this array that is processed in the main program loop reading/ storing/ accumulating the results in the node/ neuron array via the node indexes. So process the connections then the neurons... Loop.

Well... That's how I would do it.

 :)
It thunk... therefore it is!...    /    Project Page    /    KorrTecx Website

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *******************
  • Prometheus
  • *
  • 4659
  • First it wiggles, then it is rewarded.
    • Main Project Thread
Re: Efficient vectors ....
« Reply #3 on: September 14, 2019, 10:27:37 am »
Are all ANN just vectors of numbers multplying together and propagating? That seems rather simplishly dull....I know they get a lot done doing that. I still want to know what Transformers do. I wish our father korrelan could teach us and give us  whole article about it.

Something far beyond the ruthless articles found online. Not one of them does the cake for me.

I really want to start a GPU project and tune my own GPT-2, and simplfy it or whatever...
Emergent          https://openai.com/blog/

*

Korrelan

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1454
  • Look into my eyes! WOAH!
    • YouTube
Re: Efficient vectors ....
« Reply #4 on: September 14, 2019, 11:39:23 am »
It thunk... therefore it is!...    /    Project Page    /    KorrTecx Website

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *******************
  • Prometheus
  • *
  • 4659
  • First it wiggles, then it is rewarded.
    • Main Project Thread
Re: Efficient vectors ....
« Reply #5 on: September 14, 2019, 11:43:31 am »
Seen it already but Ill check it out again.
Emergent          https://openai.com/blog/

*

goaty

  • Trusty Member
  • ********
  • Replicant
  • *
  • 552
Re: Efficient vectors ....
« Reply #6 on: September 14, 2019, 11:52:37 am »
In a conventional ANN, CNN the position of the neurons is not relevant, only the connection matrix.

The neurons are stored as a indexed array along with the thresholds, bias, sigmoid vars, etc, say 1 to 1000.

The connections matrix defines the structure and is also stored in a indexed array. The connection array stores the indexes for the (from, too) nodes, weights,etc. It's this array that is processed in the main program loop reading/ storing/ accumulating the results in the node/ neuron array via the node indexes. So process the connections then the neurons... Loop.

Well... That's how I would do it.

 :)

I think keeping them small as possible, is the way to go,    if your getting needlessly mean about these artificial neural networks, its not how quick you got your output in a frame,  its how many outputs you got.

1-1000,  ordered simplicity. im into that pretty bad myself too.
« Last Edit: September 14, 2019, 01:43:04 pm by goaty »

*

AndyGoode

  • Guest
Re: Efficient vectors ....
« Reply #7 on: September 15, 2019, 05:52:42 am »
Are all ANN just vectors of numbers multplying together and propagating? That seems rather simplishly dull....I know they get a lot done doing that. I still want to know what Transformers do.

For the most part, yes, that's all neural networks are--linear algebra. There is a minor exception, though--in the book 'Neural Networks: A Comprehensive Foundation' by Simon Haykin he mentioned at the end that if neural networks are going to ever become something more than linear algebra then it's going to be because of their nonlinear transfer function, which is the only part that is too hard to easily model and therefore is not fully understood. Unfortunately, despite looking several times for that key quote, I have been unable to find it, maybe because it was in the first edition of his book and not in the second edition that the library has. My recommendation is to stay away from neural networks as a foundation of research because they're hardware, so they are not the essence of an architecture. The essence of the architecture is the middle of the 3 layers of hardware, algorithm, and goal, as described in David Marr's book 'Vision'. First figure out how to solve the problem of general intelligence, then figure out which hardware will do what you want. If neural networks fit well for what you want to do, then go ahead and choose that hardware. I lost several years of my life because I didn't have that wisdom earlier, so maybe you can save yourself a few years with that advice.

----------

(p. 23)
   It is a mathematical theorem that these conditions define the operation of
addition, which is therefore the appropriate computation to use.
   The whole argument is what I call the computational theory of the
cash register. Its important features are (1) that it contains separate argu-
ments about what is computed and why and (2) that the resulting operation
is defined uniquely by the constraints it has to satisfy. In the theory of visual
processes, the underlying task is to reliably derive properties of the world
from images of it; the business of isolating constraints that are both pow-
erful enough to allow a process to be defined and generally true of the
world is a central theme of our inquiry.
   In order that a process shall actually run, however, one has to realize
it in some way and therefore choose a representation for the entries that
the process manipulates. The second level of the analysis of a process,
therefore, involves choosing two things: (1) a representation for the input
and for the output of the process and (2) an algorithm by which the
transformation may actually be accomplished. For addition, of course, the
input and output representations can both be the same, because they both
consist of numbers. However, this is not true in general. In the case of a
Fourier transform, for example, the input representation may be the time
domain, and the output, the frequency domain. If the first of our levels
specifies what and why, this second level specifies how.
   There are three important points here. First, there is usually a wide
choice of representation. Second, the choice of algorithm often depends
rather critically on the particular representation that is employed. And
third, even for a given fixed representation, there are often several possible
algorithms for carrying out the same process. Which one is chosen will
usually depend on any particularly desirable or undesirable characteristics
that the algorithms may have; for example, one algorithm may be such
(p. 24)
more efficient than another, or another may be slightly less efficient but
more robust (that is, less sensitive to slight inaccuracies in the data on
which it must run). Or again, one algorithm may be parallel, and other,
serial. The choice, then, may depend on the type of hardware or machinery
in which the algorithm is to be embodied physically.

   This brings us to the third level, that of the device in which the process
is to be realized physically. The important point here is that, once again,
the same algorithm may be implemented in quite different technologies.
The child who methodically adds two numbers from right to left, carrying
a digit when necessary, may be using the same algorithm that is imple-
mented by the wires and transistors of the cash register in the neighbor-
hood supermarket, but the physical realization of the algorithm is quite
different in these two cases. Another example: Many people have written
computer programs to play tic-tac-toe, and there is a more or less standard
algorithm that cannot lose. This algorithm has in fact been implemented
by W. D. Hilis and B. Silverman in a quite different technology, in a com-
puter made out of Tinkertoys, a children's wooden building set. The whole
monstrously ungainly engine, which actually works, currently resides in a
museum at the University of Missouri in St. Louis.
   Some styles of algorithm will suit some physical substrates better than
others.
For example, in conventional digital computers, the number of
connections is comparable to the number of gates, while in the brain, the
number of connections is much larger (x 10^4) than the number of nerve
cells. The underlying reason is that wires are rather cheap in biological
architecture, because they grow individually and in three dimensions.
In conventional technology, wire laying is more or less restricted to two
dimensions, which quite severely restricts the scope for using parallel
techniques and algorithms; the same operations are often better carried
out serially.

   The Three Levels

We can summarize our discussion in something like the manner shown in
Figure 1-4, which illustrates the different levels at which an information-
processing device must be understood before one can be said to have
understood it completely. At one extreme, the top level, is the abstract
computational theory of the device, in which the performance of the device
is characterized as a mapping from one kind of information to another, the
abstract properties of this mapping are defined precisely, and its appro-
priateness and adequacy for the task at hand are demonstrated. In the
center is the choice of representation for the input and output and the
(p. 25)
-----
Computational theory

What is the goal of the
computation, why is it
appropriate, and what
is the logic of the strat-
egy by which it can be
carried out?

Representation and
algorithm

How can this computa-
tional theory be imple-
mented? In particular,
what is the representa-
tion for the input and
output, and what is the
algorithm for the trans-
formation?

Hardware
Implementation

How can the represen-
tation and algorithm be
realized physically?

Figure 1-4. The three levels at which any machine carrying out an information-
processing task must be understood.
-----

algorithm to be used to transform one into the other. And at the other
extreme are the details of how the algorithm and representation are real-
ized physically--the detailed computer architecture, so to speak. These
three levels are coupled, but only loosely. The choice of an algorithm is
influenced for example, by what it has to do and by the hardware in which
it must run.
But there is a wide choice available at each level, and the
explication of each level involves issues that are rather independent of the
other two.
   Each of the three levels of description will have its place in the eventual
understanding of the perceptual information processing, and of course they
are logically and causally related. But an important point to note is that
since the three levels are only rather loosely related, some phenomena
may be explained at only one or two of them. This means, for example,
that a correct explanation of some psychophysical observation must be
formulated at the appropriate level. In attempts to relate psychophysical
problems to physiology, too often there is confusion about the level at
which problems should be addressed. For instance, some are related
mainly to the physical mechanisms of vision--such as afterimages (for
example, the one you see after staring at a light bulb) or such as the fact
that any color can be matched by a suitable mixture of the three primaries
(a consequence principally of the fact that we humans have three types of
cones). On the other hand, the ambiguity of the Necker cube (Figure 1-5)
seems to demand a different kind of explanation. To be sure, part of the
explanation of its perceptual reversal must have to do with a bistable neural
network (that is, one with two distinct stable states) somewhere inside the
(p. 26)
brain, but few would feel satisfied by an account that failed to mention the
existence of two different but perfectly plausible three-dimensional inter-
pretations of this two-dimensional image.
   For some phenomena, the type of explanation required is fairly
obvious. Neuroanatomy, for example, is clearly tied principally to the third
level, the physical realization of the computation. The same holds for syn-
aptic mechanisms, action potentials, inhibitory interactions, and so forth.

Marr, David. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Cambridge, Massachusetts: The MIT Press.

----------

(p. 4)
From the above discussion, it is apparent that a neural network derives its computing
power through, first, its massively parallel distributed structure and, second, its ability to
learn and therefore generalize; generalization refers to the neural network producing
reasonable outputs for inputs not encountered during training (learning). These two infor-
mation-processing capabilities make it possible for neural networks to solve complex
(large-scale) problems that are currently intractable. In practice, however, neural networks
cannot provide the solution working by themselves alone. Rather, they need to be integrated
into a consistent system engineering approach. Specifically, a complex problem of interest
is decomposed into a number of relatively simple tasks, and neural networks are assigned
a subset of the tasks (e.g., pattern recognition, associative memory, control) that match
their inherent capabilities. It is important to recognize, however, that we have a long way
to go (if ever) before we can build a computer architecture that mimics a human brain.

   The use of neural networks offers the following useful properties and capabilities:

   1. Nonlinearity. A neuron is basically a nonlinear device. Consequently, a neural
network, made up of an interconnection of neurons, is itself nonlinear. Moreover, the
nonlinearity is of a special kind in the sense that it is distributed throughout the network.
Nonlinearity is a highly important property, particularly if the underlying physical mecha-
nism responsible for the generation of an input signal (e.g., speech signal) is inherently
nonlinear.

   2. Input-Output Mapping.

Haykin, Simon. 1994. Neural Networks: A Comprehensive Foundation. New York, New York: Macmillan College Publishing Company.

« Last Edit: September 15, 2019, 07:42:56 pm by AndyGoode »

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *******************
  • Prometheus
  • *
  • 4659
  • First it wiggles, then it is rewarded.
    • Main Project Thread
Re: Efficient vectors ....
« Reply #8 on: September 15, 2019, 06:09:03 am »
I agree, all hardware, hardware simulation (korrelan's), etc are just running the algorithm. The algorithm can be anything. Only then do you pick the best hardware and simulate/utilize a hardware. I believe korrelan's virtual hardware is really the AGI algorithm - not hardware - at the same time he is using his hardware to max efficiency. So he really has an algorithm + hardware tool to RUN it. The huge web in korrelan's AGI is not hardware, it learns using a lot of different data, this is relational resolving/cycling, and only uses a (good) hardware - effectively.

Currently my theory is really far and I don't know how farther it can go. I still don't understand GPT-2 yet I have remade it in my own way and understand what it does fully. But the implementation/ hardware they do I simply am lost. It takes so long on so many GPUs to train and then can run on my pc with little overhead. One could explain the training and/or the trained net. It will take a rethinking how to teach Locky quickly, clearly no one is doing it correctly, else I would understand quickly. If one cannot explain, they do not know their own work :)
Emergent          https://openai.com/blog/

 


OpenAI Speech-to-Speech Reasoning Demo
by ivan.moony (AI News )
Today at 01:31:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am
Nvidia Hype
by 8pla.net (AI News )
December 06, 2023, 10:04:52 pm
How will the OpenAI CEO being Fired affect ChatGPT?
by 8pla.net (AI News )
December 06, 2023, 09:54:25 pm
Independent AI sovereignties
by WriterOfMinds (AI News )
November 08, 2023, 04:51:21 am
LLaMA2 Meta's chatbot released
by 8pla.net (AI News )
October 18, 2023, 11:41:21 pm

Users Online

305 Guests, 0 Users

Most Online Today: 346. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles