About PURR-PUSS
(Purposeful Unprimed Real-world Robot with Predictors Using Short Segments)
PURR-PUSS was invented in two stages, first PUSS in 1972 and then PURR
in 1974.
PUSS was a predictor designed for my earlier STeLLA learning machine, but it was so successful that a multiple PUSS was devised to be the core of a new learning machine, named PURR-PUSS. STeLLA was built out of relays by Peter L. Joyce at the Standard Telecommunication Laboratories, Harlow, Essex, U.K., in 1962-3, but the reliability of relay technology proved inadequate.
The development of PURR-PUSS is described in "Man-Machine Studies" reports UC-DSE/1(1972) to UC-DSE/40(1991), Progress Reports to the Defence Scientific Establishment (New Zealand), ed. J.H.Andreae, ISSN 0110 1188.
These reports have been deposited in the Library of Congress, Washington D.C., USA; the National Technical Information Service, Virginia, USA; the British Lending Library, Boston Spa, UK; and many university libraries around the world. The first of these reports, which describes the development of PUSS as a predictor for STeLLA, can be downloaded as UC-DSE-1.zip and unzipped to the Microsoft Word document UC-DSE-1.doc from the Downloads Section.
A Brief Look at STeLLA
In 1961, when my research began in what was a new area for me, called Cybernetics, there wasn't much doubt that any working model of the brain would have to learn from scratch: The tabula rasa, or blank slate, was how a machine that would learn like the brain had to start. Learning was what it was all about. Babies didn't know very much at all, if anything. A few years later things would change dramatically and by 1972 in the heyday of Artificial Intelligence (AI) people would be amazed by the programmed creations of Shortliffe (MYCIN), Lenat (AM), Winograd (SHRDLU) and others. For a while, it seemed to many as though that was the way to reveal the mysteries of the brain. That view was strongly supported by the new linguistics driven by Noam Chomsky and his argument that we are born with a Universal Grammar. However, I was not alone in finding the AI approach unconvincing and many people studying child development continued to doubt the amount of innate structure being assumed. Thirty one years later, things have mellowed and now I find myself somewhere in between the AI approach and connectionism, the new name for neural networks, but certainly closer to the latter. Research has shown that babies start with a lot more than was previously guessed and a major question is how to mesh learning with innate structures. My bias has always been to see learning as primary and this will become clearer in the design of my learning brain, called PURR-PUSS, or PP for short.
PP evolved from an earlier learning machine called STeLLA. For STeLLA, happenings in its world were of two types, patterns and actions. A pattern was divisible into features, and weights were attached to the features, as in Rosenblatt's famous "perceptron". Patterns were mainly seen as visual input and the task facing STeLLA was formulated in a visual form
STeLLA's Task
You are sitting in a dark room. There is a horizontal row of white lights in front of you (input pattern), each light being ON or OFF at any instant. Below the lights a row of push-buttons (actions) lie within your reach and these may be pressed, one at a time, in any sequence. There is nothing else but a pair of coloured lights below the buttons. One of these is green (reward) and the other is red (punishment). Your task is to cause the green light to flash on as often as possible without the red light coming on.
The task is not well defined, but nor are ordinary human tasks, like
becoming good at something. We are not told how often the green light
could or should be made to come on or how serious the red light is. The
numbers of white lights and push buttons will be different for different
versions of the task. When we try out STeLLA with a task, we program
the way the white lights and the coloured lights change with the button
pushes. Different "world programs" give the task different degrees of
difficulty. Once we have shown that STeLLA can handle the task of a particular
difficulty of world program, we can devise a world program which introduces
a new level of difficulty and see what has to be done to the design of
STeLLA to enable it to tackle the task with the new world program, without
losing its ability with the older world programs. This process of increasing
task difficulty continues to the present day and it led to the major
change from STeLLA to PURR-PUSS, as we shall see.
STeLLA made simple associations between patterns (of the row of lights) and
actions (button presses), called pattern-action pairs. A pattern-action pair
recorded the fact that an action had been executed immediately following the
seeing of a pattern. Now, after the action of a pattern-action pair had been
executed (button pressed), a new pattern of lights would usually follow. First
a pattern, then an action chosen, then a pattern seen, then another action
chosen, and so on. We could write this sequence thus: p-a..p-a..p-a..p-a..p-a..p
...
In STeLLA's memory a network was formed of pattern-action pairs connected to patterns that had followed them. So if, pattern-action pair-1 was followed by pattern-A in STeLLA's world, then there would be a one-way connection from pattern-action pair-1 to pattern-A in STeLLA's memory. If the reward light came on when the action of a pattern-action pair was executed then the action was marked in memory as rewarded. So the memory of STeLLA consisted of a network of pattern-action pairs connected to patterns, where each pattern may be the pattern of a number of pattern-action pairs. A collection of pattern-action pairs with a common pattern is called a node. Some of the pattern-action pairs are marked as rewarded. It is important that the reader sees the relation between nodes and pattern-action pairs, so Figure 1 is given to illustrate the relation by showing a possible fragment of STeLLA's memory. Sometimes it will be more convenient to talk about pattern-action pairs and sometimes nodes will be preferable.
Figure 1 (left). Each node consists of a pattern P and one or more actions A. It can also be seen as one or more P-A pairs with the same P. Each connection from a node to a node goes from an action of the first node to the pattern of the second node. The figure could be a fragment of STeLLA's memory.
When STeLLA chose a particular action for a particular pattern, it was choosing an action of a particular node, or a particular pattern-action pair, in the network of its memory. From that pattern-action pair in its memory, paths led through other patterns and their actions to rewarded pattern-action pairs. STeLLA could then choose shorter, stronger paths to get back to rewarded pattern-action pairs. The strength of reward on a pattern-action pair was decreased if it occurred again without getting reward, and it was increased if it occurred again with reward. Connections between nodes also had variable strengths. To take a simple example, if pattern-action pair-1 could be followed by either the pattern of node-A or the pattern of node-B, then there would be two one-way connections from pattern-action pair-1, one to node-A and the other to node-B. The strengths of these two connections were varied to represent the probabilities that pattern-action pair-1 was followed by node-A and node-B respectively.
When STeLLA's task included the red, punishment light, it was used to mark
actions on nodes in the same way as the reward light, but punished pattern-action
pairs inhibited STeLLA from going along paths instead of encouraging it.
STeLLA's memory told it the best way to go to get reward as soon as possible,
and to avoid punishment. This was done by a process I call leakback. This
is how it works. Imagine that STELLA's memory is a network of interconnected
pipes. The pipe connecting the action of one node to a second node has a diameter
which varies according to the strength of that connection. A stronger connection
means a larger pipe. Here I have to introduce a slight complication.
Each node is a little container which lets liquid enter from pipes that connect
to nodes that follow that node, and lets liquid run out through pipes that
connect to nodes that came before that node. I should emphasise that if there
is a one-way connection from an action in one node to the pattern of another
node, then the pipe representing that connection only allows liquid to flow backwards through
it. A rewarded action of a node has an additional input from a tap that lets
liquid into the container according to the strength of its reward. A punished
action of a node has an outlet which lets liquid escape from the container
at a rate which increases with the strength of the punishment. In Figure 2,
I have redrawn Figure 1 with pipes instead of connections. Notice that the
allowed flows through the pipes in Figure 2 is in the opposite direction to
arrows showing connections in Figure 1.
Figure 2 (left). This is the same as Figure 1 with pipes replacing connections and hollow arrows showing the direction of liquid flow for leakback.
Looking closely at Figure 2, we see that liquid can flow from the rewarded
action A3 in the top right node to the pattern of that node. Then it can
flow directly to action A1 of the node on the left, and also it can flow
via the node at the bottom right to the action A2 of the node on the left.
Let us suppose, that STeLLA is seeing the pattern corresponding to P1 in
the node on the left of Figure 2. If the flow, i.e. leakback, from the top
node to the left node is greater than the flow via the bottom right node,
then STeLLA will choose the action A1 that receives the direct flow; but
if the flow to action A2 via the node on the bottom right is stronger, STeLLA
will choose that action, A2. Of course, when there are hundreds of nodes
and many actions of nodes are rewarded or punished, the flows will be very
complicated to calculate, but STeLLA will still choose the action which has
the strongest flow. If the pipe diameters used for the calculations are varied
correctly, then the choices of STeLLA can be governed by expectations of
reward derived from the best estimates of mathematical probabilities. For
various reasons, including the fact that the STeLLA tasks do not often correspond
to statistically stationary environments, the choices of action will often
be sub-optimal, and sometimes downright stupid!
STeLLA planned future paths by means of special memory matrices, one for each
action, that predicted the next pattern, given the present pattern and action.
Plans are made in a different way in PURR-PUSS using its main memory, so I
will not mention STeLLA's planning method any further. The simple idea behind
planning is that when the system is about to choose an action, it explores
into the future with its memory and if this leads to a path to reward, then
it tries to keep to that path. The plan is made step by step: first the next
action is chosen, then it predicts the next pattern, chooses the next action,
predicts the next pattern, and so on until a rewarded action is found in its
memory. The trace of a successful path through its memory is used to bias its
decisions until the goal is reached or the predicted patterns fail to be realized.
Planning is important because it can lead to the discovery of new paths that
had not been taken before.
The earliest description of STeLLA can be found in the Downloads section
as STELLA.doc. A later description of STeLLA as A Learning Machine in the
Context of the General Control Problem, was published in the Proceedings
of the 3rd IFAC Congress, London (1966) pp.342-9 by Brian R. Gaines and me.
In this latter paper, STeLLA is given a car-steering problem on a road with
camber and slippage, and is shown to learn the task satisfactorily.
Limitations of STeLLA
The main limitation of STeLLA was lack of sequential strength. It couldn't
learn a sequence of events without there being a strong possibility that
a similar sequence would take over. It could hear "Two times two equals
four" a thousand times and have that well learned, but when you started
to teach it "Three times two equals six", the two sequences would become
hopelessly entangled. After "Two" and also after "Three" it expects "times",
which is all right. After "times" it is expecting "two" and then "equals".
So far so good, but now the trouble starts. After "equals" it expects
either "four" or "six"., and its memory doesn't tell it whether it has
come along the "Two times two equals" sequence or the "Three times two
equals" sequence, so its memory isn't telling it whether the next word
should be "four" or "six". This is what I mean by not having sequential
strength.
Language is full of sequences in the form of phrases and sentences so it is
not possible for STeLLA to learn even the simplest phrases without getting
them muddled up with others. STeLLA could do the car steering task, mentioned
above, because every decision could be taken on the basis of its last sensory
input. It didn't need to know what had happened just before. Real driving is
much more sequential. It is essential that a real driver of a car keeps the
context of what is happening in mind. If you have just passed a speed limit
sign, you must keep that in mind --- and a hundred other things.
We can divide up the STeLLA behaviour into two parts, what happens between
pattern and action, and what happens between action and pattern. When STeLLA
sees the pattern of lights, it has to choose a button to push. This choice
is decided by its control policy that uses leakback to calculate expectations
of reward for the different buttons. The choice of button is also influenced
by its plans and, to make a plan, STeLLA has to predict what the next pattern
of lights will be when it pushes a button. In the case of a real robot, what
it will see next after it does an action is determined by its world. So, to
predict what the next pattern of lights will be, STeLLA has to "guess" what
the world behind the buttons and lights will do. We can say that STeLLA needs
a model of the world which reacts to button pushes by changing the pattern
of lights. In the car-steering example mentioned above, the world is the car
and the road with turns and skids, all of which contribute to what the next
position and angle of the car will be, and hence what the pattern of lights
will be.
STeLLA's model of the world also lacked sequential strength, because it used
only the last pattern to predict the effect of a button push. Its control policy
did at least string together pattern-action pairs, even if that stringing together
was weak. The weakness of STeLLA's predictor was more serious so we tackled
that first. The struggles that I and my postgraduate students had with this
problem can be seen in the first of a series of Man-Machine Studies reports,
which is available as UC-DSE-1.doc from the Downloads section of this
Web Page. Our aim was to model a world that had the structure of a finite state
machine. Eventually, with crucial input from John G. Cleary, a predictor called
PUSS was devised for STeLLA. PUSS stood for "Predictor Using Slide and Strings" which
described its mechanism at that time.
To cut a long story short, PUSS was so successful that STeLLA was abandoned
and a new system was constructed, called PURR-PUSS. The PURR stood for "Purposeful
Unprimed Rewardable Robot" and the whole system was based on PUSSes. To our
surprise, PURR-PUSS turned out to have more sequential strength than we expected.
Eventually we would show that PURR-PUSS had the sequential strength of a Universal
Turing Machine, the most that was possible of any system. Nevertheless, PURR-PUSS
retained key features of STeLLA, including leakback.
PURR-PUSS (PP)
It would be tedious to continue the historical development of PURR-PUSS
over the next 30 years. It is well documented in the Biography and The
Book sections of this Web Page. Following the practice used in my
second book, Associative Learning for a Robot Intelligence, the
name PURR-PUSS will be abbreviated to PP.
The big difference between STeLLA and PP is in the nodes of their memory structures.
In Figure 1 we saw that each node in STeLLA's memory was a pattern, P, to which
was attached one or more actions, A. In PP each node will be a context, C,
to which is attached one or more actions, A. The change from pattern to context
gives PP the sequential strength that STeLLA lacked. PP will also have several
memories, each with its own connections and leakback, instead of just the one
that STeLLA had. This extension will justify our use of the name Multiple
Context Learning System for PP.
Another difference between STeLLA and PP is more cosmetic than fundamental.
Instead of talking about patterns and thinking of the pattern of lights, we
will talk about stimuli and we will not treat a stimulus as divisible into
separate lights (features). A stimulus will be an indivisible entity, unless
it is a complex stimulus composed of simple stimuli. So, instead of pattern
and action, we will talk about stimulus and action.
Cortical Areas (templates) and Context
Figure 3 (left). Cortical areas of a monkey brain. (Taken from Rodney Cotterill's Enchanted Looms, Cambridge University Press, 1998.)
There are at least three ways of arguing for the basic design of PP.
I mentioned above the attempts to give STeLLA's predictor 'sequential
strength'. That was the route which actually led to PP. Another way is
to take the theory of Newell and Simon (1972), in which they characterize
the human brain as an information processing system with a number of
basic properties. This route is taken in section 1.8 of my book, Associative
Learning for a Robot Intelligence, and it presents the PP design
as a very general structure. The third way is less technical and, therefore,
more reader-friendly. It considers quite broadly how neurons might be
connected in the brain.
The cerebral cortex of the human brain is a crumpled sheet of neurons and connections.
The simplest way of looking at a neuron is to say that it receives input impulses
(spikes) and when the inputs are strong enough it "fires", sending impulses
along its output connection (axon). It seems reasonable to treat events in
the brain as the firings of neurons. (Actually, in many cases, it is probably
the rate of firing of a neuron that is important, but we don't need to bother
with such details.) Neurologists label different areas of the cortex according
to where connections entering the area come from, and where connections leaving
the area go to. Figure 3, taken from Rodney Cotterill's delightful book Enchanted
Looms (Cambridge University Press, 1998) and due to Daniel Felleman and
David Van Essen, shows the labelling of the spread-out right hemisphere of
the monkey brain and is a good example of these cortical areas. Here I am not
concerned with the details of the areas but just that it is natural to think
of the cortex as divided into cortical areas (see Figure 4.).
Figure 4 (left). A cortical area. There are a number of input bundles, some coming directly from sensors through their sensory processors and others coming from the outputs of other cortical areas. Similarly, the outputs from a cortical area go to select actions or to predict stimuli, according to whether they represent action events or stimulus events. In addition, the outputs of this cortical area may travel through the cortex to be inputs to other cortical areas.
In each cortical area there will be a large number of neurons. You can
think of each neuron in a cortical area as having one input (synapse)
from each of the input bundles in Figure 4. Its output (axon) will go
to fire another distant neuron that contributes to the activation of
a particular action, or it may go to fire another distant neuron that
contributes to the prediction of a stimulus event. It may also travel
to form an input of particular input bundles of other cortical areas.
My aim is solely to suggest that cortical areas will have this general structure,
even though its details must be much more complicated. A cortical area could
hardly lack the suggested connections! We are saying very little in claiming
this amount of structure. By starting with this minimal structure, we can discover
what it is capable of and then move on to add more structure as its limiations
become apparent. Actually, we will first assume additional simplifications
about the neurons in a cortical area. It is well accepted that neurons are
likely to be able to exhibit much more complex behaviour than is normally assumed
of them in artificial neural networks. Here, we will assume that they behave
like simple AND gates. That is, a neuron responds only when all its inputs
are active. A cortical area can now be shown with more detail, as in Figure
5.

Figure 5 (left). A cortical area with one of its many neurons. The input synapses are shown as black dots on the body of the neuron. The axon is the output.
Although Figure 5 is highly simplified, it introduces many of the features
of the PP model of the brain. First there is the dotted contour suggesting
a cortical area. Inside this just one neuron is shown instead of hundreds.
The neuron fires and sends impulses down its output axon only when all
its four synapses (black dots) are activated by inputs.
To make the cortical area of Figure 5 understandable, I have chosen input and
output events for it that can be related to a possible situation. Of course,
I am not suggesting I know that there would necessarily be a particular neuron
with these inputs and output in the brain of a person in such a situation.
Far too little is known about the brain to make such a claim for what is just
an illustration.
The situation I have in mind is of someone saying "Turn the switch on" while
the person with the cortical area in question is looking at and touching the
switch. The output of the neuron causes the finger to push the switch on. The
four inputs are of four event types. There is the visual event type See, the
touching event-type Feel-finger and two hearing event types This-Word and Last-Word. All
the neurons in the cortical area receive input events of these four types,
not just the neuron shown. Similarly, all the neurons in the cortical area
have the same output event type Finger-action, which refers, let's say,
to the index finger of the right hand. The particular neuron shown responds
to the actual input events SWITCH-UP, TOUCHING-SWITCH, SWITCH and ON. It outputs
the event PUSH-DOWN.
None of these input events are likely to be raw sensory inputs, but rather the outputs of other cortical areas processing raw sensory inputs. SWITCH-UP will be the result of quite sophisticated recognition if the switch is being recognized as a switch. TOUCHING-SWITCH will need to be more than just a signal that the finger is touching something. SWITCH of event type Last-Word and ON of event type This-Word are the last two words heard. There will certainly have to be some preprocessing for sounds to be heard as word units. Remember that all these four events have to be occurring together for the neuron to fire, so the SWITCH event of Last-Word event type must be held in some form of memory to be occurring when the current word ON of event type This-Word is being heard.
Learning
How does the neuron in Figure 5 come to be connected as suggested? If
it were pre-wired, then we would have a brain that was pre-programmed
to do what it did. This is not the sort of brain that we or even animals
have. Certainly, evidence has been accumulating over the years which
indicates that a considerable amount of the brain's ability is built-in
or "innate". Even if the baby brain doesn't express some ability at birth
it may have the potential to implement that ability as part of its later
maturation. In the PP model of the brain, there is a balance between
what is assumed to be innate and what is expected to be learned. This
balance can be seen in the neuron of Figure 5.
The input connections (afferents) to the neuron are assumed to be innate. The
output is learned in a manner to be described shortly. In each cortical area,
all the neurons have one input connection (i.e. one synapse) from each of the
event types. In other words, a neuron has one connection from each of the input
bundles in Figure 4, because there is a bundle of connections for each event
type. An important feature of PP is that there is no neuron for a combination
of input events that hasn't yet occurred. We can imagine that real brains have
a clever process for connecting up new neurons as new combinations of events
occur; but it seems likely that, if the PP model is at all valid, that the
processes in real brains will be more statistical than those in PP.
To explain how the output of the neuron in Figure 5 might be learned, I have to introduce another neuron into Figure 5 (as I did in Figure 2 in chapter 1 of Associative Learning for a Robot Intelligence). Figure 6 includes this complication and, in the same way as Figure 5 showed only one neuron, Figure 6 shows only one of the hundreds of these extra neurons. I will call the first neuron a context neuron, because its input events form a context for the event of the second neuron, which will be called an associated neuron.
Figure 6 (left). Each cortical area has hundreds of context neurons and hundreds of associated neurons. The output Finger-action PUSH-DOWN is first activated only when other (e.g. reflex) processes provide direct input to the associated neuron, but after learning it is also activated by the firing of the context neuron. The associated neuron fires if either of its synapses are activated.
The output, or axon, of the first neuron is shown dotted and connects
via the lower synapse to the associated neuron. This synapse is where
learning takes place.
The reader will notice that the associated neuron is labelled with the same
input and output "Finger-action: PUSH-DOWN". If the finger is pushed
down, then the associated neuron is fired by impulses to its upper synapse.
However, if the context neuron fires at the same time as the associated neuron,
the learning synapse increases in strength. After the synapse has been strengthened,
the associated neuron can be fired by the context neuron even when there is
no input to the upper synapse of the associated neuron. The cortical area has
learned to respond with the action PUSS-DOWN whenever the context "SWITCH-UP
+ TOUCHING-SWITCH + SWITCH + ON" occurs.
This is so important that I will repeat it a different way. The lower synapse
on the associated neuron is sensitive to the simultaneous firing of the context
neuron and the associated neuron. To begin with there is no connection between
the two. The synapse is a potential connection but it is not made. If the context
neuron and the associated neuron fire at the same time, then the synapse connection
is made and, from then on, if the context neuron fires by itself it will induce
the associated neuron to fire. The cortical area has learned to PUSH-DOWN finger
when it sees SWITCH-UP, feels TOUCHING-SWITCH and hears (in succession) the
two words SWITCH and ON.
Before any learning occurs, the cortical area has hundreds of context neurons
connected to different combinations of the input events from the four event
types, and it has hundreds of associated neurons with the different events
of the output event type, Finger-action. Each context neuron has potential
connections to associated neurons of all the different events. Learning makes
connections between contexts and associated events that actually occur together.
Through learning, the cortical area learns which contexts have occurred with
which associated events.
There are various minor problems with implementation, and particularly with
timing, but since we can solve those with electronic circuits there is no reason
to doubt that real biological neuron circuits can do so too. Indeed, we can
expect biological neurons to be using all sorts of clever processes, which
we haven't even thought of yet. This biological suggestion for a cortical area
is no more than that: a suggestion. It is just one more reason for supposing
that the architecture of PP is a viable proposal for how the biological brain
works.
Before leaving the biological view of cortical areas, I need to take one more
step. It is not going to be convenient to draw Figure 6, or even Figure 5 every
time I want to talk about cortical areas, so a more compact notation is required.
Figure 7 will do the job.
Looking at part A of Figure 7, we see that the horizontal rectangle is divided into two main parts with a thick vertical line between them. The context event types and events are on the left with thin line dividers, while the associated event type and event are on the right. The rectangle corresponds to the whole of Figure 5 or 6, in which one or a pair of neurons are picked out as being an association. The context is associated with the associated event. If the event types are obvious, we can represent an association by the rectangle in part C, which only shows the events. Now, a cortical area is characterized by the event types which form contexts and the event type which forms associated events, so part B of Figure 7 describes the cortical area as a whole. Of course, at times we may need to list all the associations in a cortical area, but much of the time part B of Figure 7 will be what we need. It describes, if you like, a cortical area before it has learned any particular associations.
A good reason for leaving the biological version of PP behind and carrying on with the simplified view of a cortical area given in Figure 7 is that we can experiment with different varieties of cortical area without having to produce neuron diagrams to match. It is not that this couldn't be done, but it would be laborious, complicated and pointless. We don't have a clear enough picture of how neurons work (except in specific parts of the brain, like the cerebellum) so there would be nothing to compare our neuron diagrams with. Also, it makes more sense to try and compare the behaviour of PP with the brain at a higher level before attempting it at a lower detailed level. However, we will keep the name "cortical area" to remind us of this link with the structure of the brain. [In Associative Learning for a Robot Intelligence, the term 'template' was used instead of 'cortical area'.]
Elaborations of a Cortical Area
A significant part of the research with PP has been in discovering elaborations
of the basic idea of a cortical area. "Decision and prediction", "timing
and threading", "choice and replacement", "predicted input events", "context
of contexts", "auxiliary actions", "reward and novelty", and the broad
concept of "multiple context" are some of the directions in which this
elaboration has proceeded. Each direction is pointed to by a class of
problem which a brain will have to cope with.
Decision and prediction refers to a problem that we have
already discussed. A brain has to build up a control policy or
decision strategy to guide its choice of action, but it also
has to model the world that it is interacting with through its
body. This immediately divides cortical areas into those with
actions for associated events and those with stimuli for associated
events. Cortical areas forming part of the control policy have
actions for associated events, while cortical areas forming part
of the model of the world have stimuli for associated events.
In making a plan, PP first chooses an action with the control
policy cortical areas and then predicts stimuli with the model
of the world cortical areas, then chooses an action, then predicts
stimuli, ... until either a goal is found, the plan goes into
a loop, or it fails to choose an action or predict a stimulus.
Timing and threading are two ways of dealing with the
succession of events. A cortical area may have both timing
and threading event types, but it must have at least one timing
event type if its associations are going to say when the
associated event will happen. Events from timing event types
occur at regular intervals, like the tick of a clock. In the
case of threading event types, it is only the order or sequence
of events that matters. In figures 5-7, the See and Feel-finger event
types are probably timing, because seeing and touching are
continuous, even if at times nothing is seen or nothing is
felt. The This-Word and Last-Word event types
refer to a word that has just been heard and the one that was
heard before that. These could be timing event types if we
were using very rigid talking with a word (or no word) during
each fixed interval, but it would be more natural to use threading
event types for them. A gap between words is more likely to
indicate slower talking than a "null word". Threading event
types can give the context of an association an extended span
over time, which may be useful for linking earlier events to
later events.
Choice and replacement distinguish two different kinds
of knowledge and they are both necessary. In a "choice" cortical
area, each context neuron can be linked to many associated
neurons, because each context can be followed by different
associated events on different occasions. The "choice" association
remembers all the associated events that have followed its
context. This is particular important for control policy cortical
areas because the choice associations hold the different alternatives
from which selection is made by the leakback process and the
highest expectation of getting to a goal. "Replacement" cortical
areas are more likely to be used in the model of the world
for predicting stimuli. A "replacement" association remembers
only the most recent associated event. Learning a new link
to an associated event wipes out all previous ones. Choice
and replacement associations store two different kinds of knowledge,
which are answers to two different kinds of question: What
pieces are there to move in a game of chess? and Where is the
opponent's king? Who might be the next prime minister of New
Zealand? and Who is the present prime minister? I will explain
later how PP's computational power depends upon having both
choice and replacement cortical areas.
Predicted input events. The importance of having "predicted
input events" in a context can be appreciated if one recalls
how much of one's thought is about things that one is not seeing,
feeling or hearing. Later we will see that predicted event
types in a cortical area can enable special kinds of computation
to be achieved and they also enable PP to extract changing
information from replacement associations.
Context of contexts. The idea behind "context of contexts" is
that the occurrence of an association can be an input event
in another association. This is clearly a way of building up
a hierarchy of cortical areas, but I have done very little
in this direction yet. This could be a way of implementing "chunking" of
information, which is something we seem to be able to do, but
I have not yet found adequate conditions for treating an association
as an event. The possibilities and difficulties will become
more apparent when I discuss a little experiment described
in section 3.7 of Associative Learning for a Robot Intelligence.
Auxiliary actions. Probably the main reason for our
being surprised when we first tried PP was that we hadn't anticipated
the power of "auxiliary actions". It is one thing to consider
actions as defining what a robot could do to the world. If
you give a robot legs and feet, it can walk, with hands it
can pick up things and throw them, with a voice it can make
sounds, with muscles in its face it can make grimaces, and
so on. An action is called "auxiliary" when it is not needed
for doing the job in hand. If a robot has plenty of actions,
then it is likely to have spare actions most of the time. These
actions can be used by the robot to represent things to itself.
At the simplest level, I might tie a knot in my handkerchief
to remind myself to do something, or I might keep repeating
to myself "I must post that letter" so that I don't walk past
the mail box without doing so. Counting is such an important
process, that it can become a mental task in its own right,
but often we use it just to distinguish things. If I am performing
a task with 5 steps, I may say "one", "two", "three", "four", "five" as
I do the steps just to prevent myself from making a mistake.
When we use speech not for communication but as part of our
thinking, it can be seen as an auxiliary activity and even
more pervasive is the "inner speech" with which we constantly
talk to ourselves about what we are doing or might do or what
might happen. But language itself will need auxiliary actions
to hold the structure of the sentences we speak or read. I
expect the so-called "non-terminals" of grammar, like noun
and noun phrase, will need to be handled with auxiliary actions.
A detailed example of the use of auxiliary actions for holding
the structure of a hierarchical task is given in chapter 8
of Associative Learning for a Robot Intelligence.
Reward and Novelty. Some of the associations in a cortical
area will be marked as goals, just as in the network of STeLLA
some pattern-actions were marked as rewarded or punished. But
in PP there is an additional category of goal, called "novelty",
which makes all the difference to PP's teachability. Indeed,
we can argue that novelty goals give PP free will because they
are not given to PP by a designer or programmer and yet they
determine many of its actions. Every new association is marked
as a "novelty goal" until it occurs again. While an association
is new to PP, the leakback system is trying to make PP repeat
it. Leakback operates from novelty goals, the same way as it
operates from ordinary reward goals that keep their reward
status once activated.
It is difficult to teach PP with rewards because when something is learned
we want to drop the rewarding and that is not what PP is expecting. I suspect
that the same problem arises from using rewards in teaching children. But PP,
and children, like learning things that are new. With PP, once something has
been learned the relevant novelty goals disappear and it is ready for new novelty
goals. "Volatile" reward goals, which disappear if reward is not repeated,
are also good for teaching and are effective if one is trying to persuade PP
to go over something that has been learned and has become "boring"! [Novelty
goals were introduced originally as a kind of random search to be carried out
when PP had nothing else to do, but we discovered to our surprise that it made
PP much more teachable.] Another bonus of novelty goals is that they work to
integrate new knowledge into existing knowledge. Once a path has been established
from a new association to itself through the rest of a memory network, that
association becomes more accessible.
Novelty is a key feature of PP because it makes PP more teachable, integrates
new knowledge into memory, and induces free will.
Multiple context. Giving a brain a multiple context
is like giving someone a well-equipped workshop. Someone reading
my book Associative Learning for a Robot Intelligence might
be forgiven for thinking that multiple context is a method
of programming because each problem is tackled with the minimum
multiple context. This was necessary because the presentation
had to be brief and also because much of the research was done
with much more limited computing equipment than what is now
available in a desktop machine. A multiple context should be
designed for the whole input space of the body available to
the brain, so that it can make use of all its sensory information
and its whole armoury of actions. An essay on multiple context
with examples of the effects of enlarging it can be downloaded
as the Word file MultipleContext.doc in the Downloads section.
There are at least two ways of looking at the concept of a "multiple context".
There is detailed processing and also the more general "thinking". We saw earlier
that each cortical area holds associations with contexts drawn from particular
event types. Multiple context refers to the activity of many cortical areas
at the same time. We know that activity is spread across large areas of the
cortex of our brains and that it is constantly changing with some areas active
and then others, like the bubbling in a pot simmering on the stove. Not only
do our thoughts move around from topic to topic, but we can be doing several
things at once, such as driving a car, noticing a bird in a tree, and continuing
with a conversation. For such thinking to occur, we can imagine clouds of active
cortical areas combining and separating as the different activities make different
demands on the overall resources. At the more detailed level of processing,
a few cortical areas may have to work together to achieve some specific mechanism
or task. Associative Learning for a Robot Intelligence is full of examples
of the way specific combinations of cortical areas can enable a robot to perform
different kinds of task. To enable cortical areas to work together for a specific
task, they are put into "clusters", each with its own organization of interconnections.
More general and more fluid groupings will take place with clusters cooperating
and then separating as activity focuses and spreads.
The processing and thinking levels may not exhaust the ways in which we can see cortical areas working together in groups. For example, our brains are capable of conscious and subconscious activity. Some people have suggested that thinking becomes conscious when a higher level refers to a lower level, such as when one makes a plan and then reconsiders and possibly alters it. If this is the case, as I believe it may well be, we will need to find ways for groups of cortical areas to monitor or control other groups of cortical areas. There may also be cortical areas organized as maps and other innate structures, but we are then getting away from PP, which is intended to be only how the highest levels operate. Learning is not likely over several levels because a lower level has to reach a stable state before the next higher level can begin using its products. For example, phonemes must be stable before words are learned, and words must be stable before sentences can be learned. However, this example itself points to some flexibility. We can move into an area where a different dialect of our language is spoken and be able to adapt to different phonemes forming the same words. Also, we are continually adding new words to our vocabulary without having to learn new grammar because the new words fit into the existing grammatical structure of our sentences.
No Program in the Driving Seat
Many people have argued that the brain can't be like a computer because computers are "formal systems" which are told what to do by a program. Two famous, and very different, versions of this argument are John Searle's "Man in the Chinese Room doesn't understand" argument and the Lucas-Penrose "Gödel Theorem leads to paradox" argument. In the former the program is a fixed "book of rules" and in the latter it is an "algorithm". I don't think that one needs such deep arguments to accept the premise that there must be no top-level program (or fixed book of rules, or algorithm) driving a brain, if it is to have real intelligence and creativity. So what is PP driven by? The PP brain is driven by a changing and growing collection of associations which are generated by the interaction of PP and the world via its body. PP has no top-level program, fixed book of rules or algorithm. I emphasised the importance of the body as the means for a brain to interact with the world in my 1977 book Thinking with the Teachable Machine as follows (page 6):
Sometimes people think of their bodies as cages within which they live, but bodies are quite the opposite of cages. My body connects me to the world around me in such an intimate and interactive way that I become part of that world. I can participate in the dynamic society and culture of humanity. My cage is wide open. Only if we enable our machines to interact freely with the world around them can we release them from the cage of paradox. ... PURR-PUSS is unique, to my knowledge, in being a system that could be given a body-in-the-world.