Electrical and Computer Engineering

Electrical and Computer Engineering

About PURR-PUSS

(Purposeful Unprimed Real-world Robot with Predictors Using Short Segments)

PURR-PUSS was invented in two stages, first PUSS in 1972 and then PURR in 1974.

PUSS was a predictor designed for my earlier STeLLA learning machine, but it was so successful that a multiple PUSS was devised to be the core of a new learning machine, named PURR-PUSS. STeLLA was built out of relays by Peter L. Joyce at the Standard Telecommunication Laboratories, Harlow, Essex, U.K., in 1962-3, but the reliability of relay technology proved inadequate.

The development of PURR-PUSS is described in "Man-Machine Studies" reports UC-DSE/1(1972) to UC-DSE/40(1991), Progress Reports to the Defence Scientific Establishment (New Zealand), ed. J.H.Andreae, ISSN 0110 1188.

These reports have been deposited in the Library of Congress, Washington D.C., USA; the National Technical Information Service, Virginia, USA; the British Lending Library, Boston Spa, UK; and many university libraries around the world. The first of these reports, which describes the development of PUSS as a predictor for STeLLA, can be downloaded as UC-DSE-1.zip and unzipped to the Microsoft Word document UC-DSE-1.doc from the Downloads Section.

A Brief Look at STeLLA

In 1961, when my research began in what was a new area for me, called Cybernetics, there wasn't much doubt that any working model of the brain would have to learn from scratch: The tabula rasa, or blank slate, was how a machine that would learn like the brain had to start. Learning was what it was all about. Babies didn't know very much at all, if anything. A few years later things would change dramatically and by 1972 in the heyday of Artificial Intelligence (AI) people would be amazed by the programmed creations of Shortliffe (MYCIN), Lenat (AM), Winograd (SHRDLU) and others. For a while, it seemed to many as though that was the way to reveal the mysteries of the brain. That view was strongly supported by the new linguistics driven by Noam Chomsky and his argument that we are born with a Universal Grammar. However, I was not alone in finding the AI approach unconvincing and many people studying child development continued to doubt the amount of innate structure being assumed. Thirty one years later, things have mellowed and now I find myself somewhere in between the AI approach and connectionism, the new name for neural networks, but certainly closer to the latter. Research has shown that babies start with a lot more than was previously guessed and a major question is how to mesh learning with innate structures. My bias has always been to see learning as primary and this will become clearer in the design of my learning brain, called PURR-PUSS, or PP for short.

PP evolved from an earlier learning machine called STeLLA. For STeLLA, happenings in its world were of two types, patterns and actions. A pattern was divisible into features, and weights were attached to the features, as in Rosenblatt's famous "perceptron". Patterns were mainly seen as visual input and the task facing STeLLA was formulated in a visual form

STeLLA's Task

You are sitting in a dark room. There is a horizontal row of white lights in front of you (input pattern), each light being ON or OFF at any instant. Below the lights a row of push-buttons (actions) lie within your reach and these may be pressed, one at a time, in any sequence. There is nothing else but a pair of coloured lights below the buttons. One of these is green (reward) and the other is red (punishment). Your task is to cause the green light to flash on as often as possible without the red light coming on.

The task is not well defined, but nor are ordinary human tasks, like becoming good at something. We are not told how often the green light could or should be made to come on or how serious the red light is. The numbers of white lights and push buttons will be different for different versions of the task. When we try out STeLLA with a task, we program the way the white lights and the coloured lights change with the button pushes. Different "world programs" give the task different degrees of difficulty. Once we have shown that STeLLA can handle the task of a particular difficulty of world program, we can devise a world program which introduces a new level of difficulty and see what has to be done to the design of STeLLA to enable it to tackle the task with the new world program, without losing its ability with the older world programs. This process of increasing task difficulty continues to the present day and it led to the major change from STeLLA to PURR-PUSS, as we shall see.

STeLLA made simple associations between patterns (of the row of lights) and actions (button presses), called pattern-action pairs. A pattern-action pair recorded the fact that an action had been executed immediately following the seeing of a pattern. Now, after the action of a pattern-action pair had been executed (button pressed), a new pattern of lights would usually follow. First a pattern, then an action chosen, then a pattern seen, then another action chosen, and so on. We could write this sequence thus: p-a..p-a..p-a..p-a..p-a..p ...

In STeLLA's memory a network was formed of pattern-action pairs connected to patterns that had followed them. So if, pattern-action pair-1 was followed by pattern-A in STeLLA's world, then there would be a one-way connection from pattern-action pair-1 to pattern-A in STeLLA's memory. If the reward light came on when the action of a pattern-action pair was executed then the action was marked in memory as rewarded. So the memory of STeLLA consisted of a network of pattern-action pairs connected to patterns, where each pattern may be the pattern of a number of pattern-action pairs. A collection of pattern-action pairs with a common pattern is called a node. Some of the pattern-action pairs are marked as rewarded. It is important that the reader sees the relation between nodes and pattern-action pairs, so Figure 1 is given to illustrate the relation by showing a possible fragment of STeLLA's memory. Sometimes it will be more convenient to talk about pattern-action pairs and sometimes nodes will be preferable.

Figure 1 (left). Each node consists of a pattern P and one or more actions A. It can also be seen as one or more P-A pairs with the same P. Each connection from a node to a node goes from an action of the first node to the pattern of the second node. The figure could be a fragment of STeLLA's memory.

When STeLLA chose a particular action for a particular pattern, it was choosing an action of a particular node, or a particular pattern-action pair, in the network of its memory. From that pattern-action pair in its memory, paths led through other patterns and their actions to rewarded pattern-action pairs. STeLLA could then choose shorter, stronger paths to get back to rewarded pattern-action pairs. The strength of reward on a pattern-action pair was decreased if it occurred again without getting reward, and it was increased if it occurred again with reward. Connections between nodes also had variable strengths. To take a simple example, if pattern-action pair-1 could be followed by either the pattern of node-A or the pattern of node-B, then there would be two one-way connections from pattern-action pair-1, one to node-A and the other to node-B. The strengths of these two connections were varied to represent the probabilities that pattern-action pair-1 was followed by node-A and node-B respectively.

When STeLLA's task included the red, punishment light, it was used to mark actions on nodes in the same way as the reward light, but punished pattern-action pairs inhibited STeLLA from going along paths instead of encouraging it.

STeLLA's memory told it the best way to go to get reward as soon as possible, and to avoid punishment. This was done by a process I call leakback. This is how it works. Imagine that STELLA's memory is a network of interconnected pipes. The pipe connecting the action of one node to a second node has a diameter which varies according to the strength of that connection. A stronger connection means a larger pipe. Here I have to introduce a slight complication.

Each node is a little container which lets liquid enter from pipes that connect to nodes that follow that node, and lets liquid run out through pipes that connect to nodes that came before that node. I should emphasise that if there is a one-way connection from an action in one node to the pattern of another node, then the pipe representing that connection only allows liquid to flow backwards through it. A rewarded action of a node has an additional input from a tap that lets liquid into the container according to the strength of its reward. A punished action of a node has an outlet which lets liquid escape from the container at a rate which increases with the strength of the punishment. In Figure 2, I have redrawn Figure 1 with pipes instead of connections. Notice that the allowed flows through the pipes in Figure 2 is in the opposite direction to arrows showing connections in Figure 1.

Figure 2 (left). This is the same as Figure 1 with pipes replacing connections and hollow arrows showing the direction of liquid flow for leakback.

Looking closely at Figure 2, we see that liquid can flow from the rewarded action A3 in the top right node to the pattern of that node. Then it can flow directly to action A1 of the node on the left, and also it can flow via the node at the bottom right to the action A2 of the node on the left. Let us suppose, that STeLLA is seeing the pattern corresponding to P1 in the node on the left of Figure 2. If the flow, i.e. leakback, from the top node to the left node is greater than the flow via the bottom right node, then STeLLA will choose the action A1 that receives the direct flow; but if the flow to action A2 via the node on the bottom right is stronger, STeLLA will choose that action, A2. Of course, when there are hundreds of nodes and many actions of nodes are rewarded or punished, the flows will be very complicated to calculate, but STeLLA will still choose the action which has the strongest flow. If the pipe diameters used for the calculations are varied correctly, then the choices of STeLLA can be governed by expectations of reward derived from the best estimates of mathematical probabilities. For various reasons, including the fact that the STeLLA tasks do not often correspond to statistically stationary environments, the choices of action will often be sub-optimal, and sometimes downright stupid!

STeLLA planned future paths by means of special memory matrices, one for each action, that predicted the next pattern, given the present pattern and action. Plans are made in a different way in PURR-PUSS using its main memory, so I will not mention STeLLA's planning method any further. The simple idea behind planning is that when the system is about to choose an action, it explores into the future with its memory and if this leads to a path to reward, then it tries to keep to that path. The plan is made step by step: first the next action is chosen, then it predicts the next pattern, chooses the next action, predicts the next pattern, and so on until a rewarded action is found in its memory. The trace of a successful path through its memory is used to bias its decisions until the goal is reached or the predicted patterns fail to be realized. Planning is important because it can lead to the discovery of new paths that had not been taken before.

The earliest description of STeLLA can be found in the Downloads section as STELLA.doc. A later description of STeLLA as A Learning Machine in the Context of the General Control Problem, was published in the Proceedings of the 3rd IFAC Congress, London (1966) pp.342-9 by Brian R. Gaines and me. In this latter paper, STeLLA is given a car-steering problem on a road with camber and slippage, and is shown to learn the task satisfactorily.

Limitations of STeLLA

The main limitation of STeLLA was lack of sequential strength. It couldn't learn a sequence of events without there being a strong possibility that a similar sequence would take over. It could hear "Two times two equals four" a thousand times and have that well learned, but when you started to teach it "Three times two equals six", the two sequences would become hopelessly entangled. After "Two" and also after "Three" it expects "times", which is all right. After "times" it is expecting "two" and then "equals". So far so good, but now the trouble starts. After "equals" it expects either "four" or "six"., and its memory doesn't tell it whether it has come along the "Two times two equals" sequence or the "Three times two equals" sequence, so its memory isn't telling it whether the next word should be "four" or "six". This is what I mean by not having sequential strength.

Language is full of sequences in the form of phrases and sentences so it is not possible for STeLLA to learn even the simplest phrases without getting them muddled up with others. STeLLA could do the car steering task, mentioned above, because every decision could be taken on the basis of its last sensory input. It didn't need to know what had happened just before. Real driving is much more sequential. It is essential that a real driver of a car keeps the context of what is happening in mind. If you have just passed a speed limit sign, you must keep that in mind --- and a hundred other things.

We can divide up the STeLLA behaviour into two parts, what happens between pattern and action, and what happens between action and pattern. When STeLLA sees the pattern of lights, it has to choose a button to push. This choice is decided by its control policy that uses leakback to calculate expectations of reward for the different buttons. The choice of button is also influenced by its plans and, to make a plan, STeLLA has to predict what the next pattern of lights will be when it pushes a button. In the case of a real robot, what it will see next after it does an action is determined by its world. So, to predict what the next pattern of lights will be, STeLLA has to "guess" what the world behind the buttons and lights will do. We can say that STeLLA needs a model of the world which reacts to button pushes by changing the pattern of lights. In the car-steering example mentioned above, the world is the car and the road with turns and skids, all of which contribute to what the next position and angle of the car will be, and hence what the pattern of lights will be.

STeLLA's model of the world also lacked sequential strength, because it used only the last pattern to predict the effect of a button push. Its control policy did at least string together pattern-action pairs, even if that stringing together was weak. The weakness of STeLLA's predictor was more serious so we tackled that first. The struggles that I and my postgraduate students had with this problem can be seen in the first of a series of Man-Machine Studies reports, which is available as UC-DSE-1.doc from the Downloads section of this Web Page. Our aim was to model a world that had the structure of a finite state machine. Eventually, with crucial input from John G. Cleary, a predictor called PUSS was devised for STeLLA. PUSS stood for "Predictor Using Slide and Strings" which described its mechanism at that time.

To cut a long story short, PUSS was so successful that STeLLA was abandoned and a new system was constructed, called PURR-PUSS. The PURR stood for "Purposeful Unprimed Rewardable Robot" and the whole system was based on PUSSes. To our surprise, PURR-PUSS turned out to have more sequential strength than we expected. Eventually we would show that PURR-PUSS had the sequential strength of a Universal Turing Machine, the most that was possible of any system. Nevertheless, PURR-PUSS retained key features of STeLLA, including leakback.

PURR-PUSS (PP)

It would be tedious to continue the historical development of PURR-PUSS over the next 30 years. It is well documented in the Biography and The Book sections of this Web Page. Following the practice used in my second book, Associative Learning for a Robot Intelligence, the name PURR-PUSS will be abbreviated to PP.

The big difference between STeLLA and PP is in the nodes of their memory structures. In Figure 1 we saw that each node in STeLLA's memory was a pattern, P, to which was attached one or more actions, A. In PP each node will be a context, C, to which is attached one or more actions, A. The change from pattern to context gives PP the sequential strength that STeLLA lacked. PP will also have several memories, each with its own connections and leakback, instead of just the one that STeLLA had. This extension will justify our use of the name Multiple Context Learning System for PP.

Another difference between STeLLA and PP is more cosmetic than fundamental. Instead of talking about patterns and thinking of the pattern of lights, we will talk about stimuli and we will not treat a stimulus as divisible into separate lights (features). A stimulus will be an indivisible entity, unless it is a complex stimulus composed of simple stimuli. So, instead of pattern and action, we will talk about stimulus and action.

Cortical Areas (templates) and Context

Figure 3 (left). Cortical areas of a monkey brain. (Taken from Rodney Cotterill's Enchanted Looms, Cambridge University Press, 1998.)

There are at least three ways of arguing for the basic design of PP. I mentioned above the attempts to give STeLLA's predictor 'sequential strength'. That was the route which actually led to PP. Another way is to take the theory of Newell and Simon (1972), in which they characterize the human brain as an information processing system with a number of basic properties. This route is taken in section 1.8 of my book, Associative Learning for a Robot Intelligence, and it presents the PP design as a very general structure. The third way is less technical and, therefore, more reader-friendly. It considers quite broadly how neurons might be connected in the brain.

The cerebral cortex of the human brain is a crumpled sheet of neurons and connections. The simplest way of looking at a neuron is to say that it receives input impulses (spikes) and when the inputs are strong enough it "fires", sending impulses along its output connection (axon). It seems reasonable to treat events in the brain as the firings of neurons. (Actually, in many cases, it is probably the rate of firing of a neuron that is important, but we don't need to bother with such details.) Neurologists label different areas of the cortex according to where connections entering the area come from, and where connections leaving the area go to. Figure 3, taken from Rodney Cotterill's delightful book Enchanted Looms (Cambridge University Press, 1998) and due to Daniel Felleman and David Van Essen, shows the labelling of the spread-out right hemisphere of the monkey brain and is a good example of these cortical areas. Here I am not concerned with the details of the areas but just that it is natural to think of the cortex as divided into cortical areas (see Figure 4.).

Top

Figure 4 (left). A cortical area. There are a number of input bundles, some coming directly from sensors through their sensory processors and others coming from the outputs of other cortical areas. Similarly, the outputs from a cortical area go to select actions or to predict stimuli, according to whether they represent action events or stimulus events. In addition, the outputs of this cortical area may travel through the cortex to be inputs to other cortical areas.

In each cortical area there will be a large number of neurons. You can think of each neuron in a cortical area as having one input (synapse) from each of the input bundles in Figure 4. Its output (axon) will go to fire another distant neuron that contributes to the activation of a particular action, or it may go to fire another distant neuron that contributes to the prediction of a stimulus event. It may also travel to form an input of particular input bundles of other cortical areas.

My aim is solely to suggest that cortical areas will have this general structure, even though its details must be much more complicated. A cortical area could hardly lack the suggested connections! We are saying very little in claiming this amount of structure. By starting with this minimal structure, we can discover what it is capable of and then move on to add more structure as its limiations become apparent. Actually, we will first assume additional simplifications about the neurons in a cortical area. It is well accepted that neurons are likely to be able to exhibit much more complex behaviour than is normally assumed of them in artificial neural networks. Here, we will assume that they behave like simple AND gates. That is, a neuron responds only when all its inputs are active. A cortical area can now be shown with more detail, as in Figure 5.

Figure 5 (left). A cortical area with one of its many neurons. The input synapses are shown as black dots on the body of the neuron. The axon is the output.

Although Figure 5 is highly simplified, it introduces many of the features of the PP model of the brain. First there is the dotted contour suggesting a cortical area. Inside this just one neuron is shown instead of hundreds. The neuron fires and sends impulses down its output axon only when all its four synapses (black dots) are activated by inputs.

To make the cortical area of Figure 5 understandable, I have chosen input and output events for it that can be related to a possible situation. Of course, I am not suggesting I know that there would necessarily be a particular neuron with these inputs and output in the brain of a person in such a situation. Far too little is known about the brain to make such a claim for what is just an illustration.

The situation I have in mind is of someone saying "Turn the switch on" while the person with the cortical area in question is looking at and touching the switch. The output of the neuron causes the finger to push the switch on. The four inputs are of four event types. There is the visual event type See, the touching event-type Feel-finger and two hearing event types This-Word and Last-Word. All the neurons in the cortical area receive input events of these four types, not just the neuron shown. Similarly, all the neurons in the cortical area have the same output event type Finger-action, which refers, let's say, to the index finger of the right hand. The particular neuron shown responds to the actual input events SWITCH-UP, TOUCHING-SWITCH, SWITCH and ON. It outputs the event PUSH-DOWN.

None of these input events are likely to be raw sensory inputs, but rather the outputs of other cortical areas processing raw sensory inputs. SWITCH-UP will be the result of quite sophisticated recognition if the switch is being recognized as a switch. TOUCHING-SWITCH will need to be more than just a signal that the finger is touching something. SWITCH of event type Last-Word and ON of event type This-Word are the last two words heard. There will certainly have to be some preprocessing for sounds to be heard as word units. Remember that all these four events have to be occurring together for the neuron to fire, so the SWITCH event of Last-Word event type must be held in some form of memory to be occurring when the current word ON of event type This-Word is being heard.

Learning

How does the neuron in Figure 5 come to be connected as suggested? If it were pre-wired, then we would have a brain that was pre-programmed to do what it did. This is not the sort of brain that we or even animals have. Certainly, evidence has been accumulating over the years which indicates that a considerable amount of the brain's ability is built-in or "innate". Even if the baby brain doesn't express some ability at birth it may have the potential to implement that ability as part of its later maturation. In the PP model of the brain, there is a balance between what is assumed to be innate and what is expected to be learned. This balance can be seen in the neuron of Figure 5.

The input connections (afferents) to the neuron are assumed to be innate. The output is learned in a manner to be described shortly. In each cortical area, all the neurons have one input connection (i.e. one synapse) from each of the event types. In other words, a neuron has one connection from each of the input bundles in Figure 4, because there is a bundle of connections for each event type. An important feature of PP is that there is no neuron for a combination of input events that hasn't yet occurred. We can imagine that real brains have a clever process for connecting up new neurons as new combinations of events occur; but it seems likely that, if the PP model is at all valid, that the processes in real brains will be more statistical than those in PP.

To explain how the output of the neuron in Figure 5 might be learned, I have to introduce another neuron into Figure 5 (as I did in Figure 2 in chapter 1 of Associative Learning for a Robot Intelligence). Figure 6 includes this complication and, in the same way as Figure 5 showed only one neuron, Figure 6 shows only one of the hundreds of these extra neurons. I will call the first neuron a context neuron, because its input events form a context for the event of the second neuron, which will be called an associated neuron.

Figure 6 (left). Each cortical area has hundreds of context neurons and hundreds of associated neurons. The output Finger-action PUSH-DOWN is first activated only when other (e.g. reflex) processes provide direct input to the associated neuron, but after learning it is also activated by the firing of the context neuron. The associated neuron fires if either of its synapses are activated.

The output, or axon, of the first neuron is shown dotted and connects via the lower synapse to the associated neuron. This synapse is where learning takes place.

The reader will notice that the associated neuron is labelled with the same input and output "Finger-action: PUSH-DOWN". If the finger is pushed down, then the associated neuron is fired by impulses to its upper synapse. However, if the context neuron fires at the same time as the associated neuron, the learning synapse increases in strength. After the synapse has been strengthened, the associated neuron can be fired by the context neuron even when there is no input to the upper synapse of the associated neuron. The cortical area has learned to respond with the action PUSS-DOWN whenever the context "SWITCH-UP + TOUCHING-SWITCH + SWITCH + ON" occurs.

This is so important that I will repeat it a different way. The lower synapse on the associated neuron is sensitive to the simultaneous firing of the context neuron and the associated neuron. To begin with there is no connection between the two. The synapse is a potential connection but it is not made. If the context neuron and the associated neuron fire at the same time, then the synapse connection is made and, from then on, if the context neuron fires by itself it will induce the associated neuron to fire. The cortical area has learned to PUSH-DOWN finger when it sees SWITCH-UP, feels TOUCHING-SWITCH and hears (in succession) the two words SWITCH and ON.

Before any learning occurs, the cortical area has hundreds of context neurons connected to different combinations of the input events from the four event types, and it has hundreds of associated neurons with the different events of the output event type, Finger-action. Each context neuron has potential connections to associated neurons of all the different events. Learning makes connections between contexts and associated events that actually occur together. Through learning, the cortical area learns which contexts have occurred with which associated events.

There are various minor problems with implementation, and particularly with timing, but since we can solve those with electronic circuits there is no reason to doubt that real biological neuron circuits can do so too. Indeed, we can expect biological neurons to be using all sorts of clever processes, which we haven't even thought of yet. This biological suggestion for a cortical area is no more than that: a suggestion. It is just one more reason for supposing that the architecture of PP is a viable proposal for how the biological brain works.

Before leaving the biological view of cortical areas, I need to take one more step. It is not going to be convenient to draw Figure 6, or even Figure 5 every time I want to talk about cortical areas, so a more compact notation is required. Figure 7 will do the job.

Looking at part A of Figure 7, we see that the horizontal rectangle is divided into two main parts with a thick vertical line between them. The context event types and events are on the left with thin line dividers, while the associated event type and event are on the right. The rectangle corresponds to the whole of Figure 5 or 6, in which one or a pair of neurons are picked out as being an association. The context is associated with the associated event. If the event types are obvious, we can represent an association by the rectangle in part C, which only shows the events. Now, a cortical area is characterized by the event types which form contexts and the event type which forms associated events, so part B of Figure 7 describes the cortical area as a whole. Of course, at times we may need to list all the associations in a cortical area, but much of the time part B of Figure 7 will be what we need. It describes, if you like, a cortical area before it has learned any particular associations.

A good reason for leaving the biological version of PP behind and carrying on with the simplified view of a cortical area given in Figure 7 is that we can experiment with different varieties of cortical area without having to produce neuron diagrams to match. It is not that this couldn't be done, but it would be laborious, complicated and pointless. We don't have a clear enough picture of how neurons work (except in specific parts of the brain, like the cerebellum) so there would be nothing to compare our neuron diagrams with. Also, it makes more sense to try and compare the behaviour of PP with the brain at a higher level before attempting it at a lower detailed level. However, we will keep the name "cortical area" to remind us of this link with the structure of the brain. [In Associative Learning for a Robot Intelligence, the term 'template' was used instead of 'cortical area'.]

Elaborations of a Cortical Area

A significant part of the research with PP has been in discovering elaborations of the basic idea of a cortical area. "Decision and prediction", "timing and threading", "choice and replacement", "predicted input events", "context of contexts", "auxiliary actions", "reward and novelty", and the broad concept of "multiple context" are some of the directions in which this elaboration has proceeded. Each direction is pointed to by a class of problem which a brain will have to cope with.

Decision and prediction refers to a problem that we have already discussed. A brain has to build up a control policy or decision strategy to guide its choice of action, but it also has to model the world that it is interacting with through its body. This immediately divides cortical areas into those with actions for associated events and those with stimuli for associated events. Cortical areas forming part of the control policy have actions for associated events, while cortical areas forming part of the model of the world have stimuli for associated events. In making a plan, PP first chooses an action with the control policy cortical areas and then predicts stimuli with the model of the world cortical areas, then chooses an action, then predicts stimuli, ... until either a goal is found, the plan goes into a loop, or it fails to choose an action or predict a stimulus.

Timing and threading are two ways of dealing with the succession of events. A cortical area may have both timing and threading event types, but it must have at least one timing event type if its associations are going to say when the associated event will happen. Events from timing event types occur at regular intervals, like the tick of a clock. In the case of threading event types, it is only the order or sequence of events that matters. In figures 5-7, the See and Feel-finger event types are probably timing, because seeing and touching are continuous, even if at times nothing is seen or nothing is felt. The This-Word and Last-Word event types refer to a word that has just been heard and the one that was heard before that. These could be timing event types if we were using very rigid talking with a word (or no word) during each fixed interval, but it would be more natural to use threading event types for them. A gap between words is more likely to indicate slower talking than a "null word". Threading event types can give the context of an association an extended span over time, which may be useful for linking earlier events to later events.

Choice and replacement distinguish two different kinds of knowledge and they are both necessary. In a "choice" cortical area, each context neuron can be linked to many associated neurons, because each context can be followed by different associated events on different occasions. The "choice" association remembers all the associated events that have followed its context. This is particular important for control policy cortical areas because the choice associations hold the different alternatives from which selection is made by the leakback process and the highest expectation of getting to a goal. "Replacement" cortical areas are more likely to be used in the model of the world for predicting stimuli. A "replacement" association remembers only the most recent associated event. Learning a new link to an associated event wipes out all previous ones. Choice and replacement associations store two different kinds of knowledge, which are answers to two different kinds of question: What pieces are there to move in a game of chess? and Where is the opponent's king? Who might be the next prime minister of New Zealand? and Who is the present prime minister? I will explain later how PP's computational power depends upon having both choice and replacement cortical areas.

Predicted input events. The importance of having "predicted input events" in a context can be appreciated if one recalls how much of one's thought is about things that one is not seeing, feeling or hearing. Later we will see that predicted event types in a cortical area can enable special kinds of computation to be achieved and they also enable PP to extract changing information from replacement associations.

Context of contexts. The idea behind "context of contexts" is that the occurrence of an association can be an input event in another association. This is clearly a way of building up a hierarchy of cortical areas, but I have done very little in this direction yet. This could be a way of implementing "chunking" of information, which is something we seem to be able to do, but I have not yet found adequate conditions for treating an association as an event. The possibilities and difficulties will become more apparent when I discuss a little experiment described in section 3.7 of Associative Learning for a Robot Intelligence.

Auxiliary actions. Probably the main reason for our being surprised when we first tried PP was that we hadn't anticipated the power of "auxiliary actions". It is one thing to consider actions as defining what a robot could do to the world. If you give a robot legs and feet, it can walk, with hands it can pick up things and throw them, with a voice it can make sounds, with muscles in its face it can make grimaces, and so on. An action is called "auxiliary" when it is not needed for doing the job in hand. If a robot has plenty of actions, then it is likely to have spare actions most of the time. These actions can be used by the robot to represent things to itself. At the simplest level, I might tie a knot in my handkerchief to remind myself to do something, or I might keep repeating to myself "I must post that letter" so that I don't walk past the mail box without doing so. Counting is such an important process, that it can become a mental task in its own right, but often we use it just to distinguish things. If I am performing a task with 5 steps, I may say "one", "two", "three", "four", "five" as I do the steps just to prevent myself from making a mistake. When we use speech not for communication but as part of our thinking, it can be seen as an auxiliary activity and even more pervasive is the "inner speech" with which we constantly talk to ourselves about what we are doing or might do or what might happen. But language itself will need auxiliary actions to hold the structure of the sentences we speak or read. I expect the so-called "non-terminals" of grammar, like noun and noun phrase, will need to be handled with auxiliary actions. A detailed example of the use of auxiliary actions for holding the structure of a hierarchical task is given in chapter 8 of Associative Learning for a Robot Intelligence.

Reward and Novelty. Some of the associations in a cortical area will be marked as goals, just as in the network of STeLLA some pattern-actions were marked as rewarded or punished. But in PP there is an additional category of goal, called "novelty", which makes all the difference to PP's teachability. Indeed, we can argue that novelty goals give PP free will because they are not given to PP by a designer or programmer and yet they determine many of its actions. Every new association is marked as a "novelty goal" until it occurs again. While an association is new to PP, the leakback system is trying to make PP repeat it. Leakback operates from novelty goals, the same way as it operates from ordinary reward goals that keep their reward status once activated.

It is difficult to teach PP with rewards because when something is learned we want to drop the rewarding and that is not what PP is expecting. I suspect that the same problem arises from using rewards in teaching children. But PP, and children, like learning things that are new. With PP, once something has been learned the relevant novelty goals disappear and it is ready for new novelty goals. "Volatile" reward goals, which disappear if reward is not repeated, are also good for teaching and are effective if one is trying to persuade PP to go over something that has been learned and has become "boring"! [Novelty goals were introduced originally as a kind of random search to be carried out when PP had nothing else to do, but we discovered to our surprise that it made PP much more teachable.] Another bonus of novelty goals is that they work to integrate new knowledge into existing knowledge. Once a path has been established from a new association to itself through the rest of a memory network, that association becomes more accessible.

Novelty is a key feature of PP because it makes PP more teachable, integrates new knowledge into memory, and induces free will.

Multiple context. Giving a brain a multiple context is like giving someone a well-equipped workshop. Someone reading my book Associative Learning for a Robot Intelligence might be forgiven for thinking that multiple context is a method of programming because each problem is tackled with the minimum multiple context. This was necessary because the presentation had to be brief and also because much of the research was done with much more limited computing equipment than what is now available in a desktop machine. A multiple context should be designed for the whole input space of the body available to the brain, so that it can make use of all its sensory information and its whole armoury of actions. An essay on multiple context with examples of the effects of enlarging it can be downloaded as the Word file MultipleContext.doc in the Downloads section.

There are at least two ways of looking at the concept of a "multiple context". There is detailed processing and also the more general "thinking". We saw earlier that each cortical area holds associations with contexts drawn from particular event types. Multiple context refers to the activity of many cortical areas at the same time. We know that activity is spread across large areas of the cortex of our brains and that it is constantly changing with some areas active and then others, like the bubbling in a pot simmering on the stove. Not only do our thoughts move around from topic to topic, but we can be doing several things at once, such as driving a car, noticing a bird in a tree, and continuing with a conversation. For such thinking to occur, we can imagine clouds of active cortical areas combining and separating as the different activities make different demands on the overall resources. At the more detailed level of processing, a few cortical areas may have to work together to achieve some specific mechanism or task. Associative Learning for a Robot Intelligence is full of examples of the way specific combinations of cortical areas can enable a robot to perform different kinds of task. To enable cortical areas to work together for a specific task, they are put into "clusters", each with its own organization of interconnections. More general and more fluid groupings will take place with clusters cooperating and then separating as activity focuses and spreads.

The processing and thinking levels may not exhaust the ways in which we can see cortical areas working together in groups. For example, our brains are capable of conscious and subconscious activity. Some people have suggested that thinking becomes conscious when a higher level refers to a lower level, such as when one makes a plan and then reconsiders and possibly alters it. If this is the case, as I believe it may well be, we will need to find ways for groups of cortical areas to monitor or control other groups of cortical areas. There may also be cortical areas organized as maps and other innate structures, but we are then getting away from PP, which is intended to be only how the highest levels operate. Learning is not likely over several levels because a lower level has to reach a stable state before the next higher level can begin using its products. For example, phonemes must be stable before words are learned, and words must be stable before sentences can be learned. However, this example itself points to some flexibility. We can move into an area where a different dialect of our language is spoken and be able to adapt to different phonemes forming the same words. Also, we are continually adding new words to our vocabulary without having to learn new grammar because the new words fit into the existing grammatical structure of our sentences.


No Program in the Driving Seat

Many people have argued that the brain can't be like a computer because computers are "formal systems" which are told what to do by a program. Two famous, and very different, versions of this argument are John Searle's "Man in the Chinese Room doesn't understand" argument and the Lucas-Penrose "Gödel Theorem leads to paradox" argument. In the former the program is a fixed "book of rules" and in the latter it is an "algorithm". I don't think that one needs such deep arguments to accept the premise that there must be no top-level program (or fixed book of rules, or algorithm) driving a brain, if it is to have real intelligence and creativity. So what is PP driven by? The PP brain is driven by a changing and growing collection of associations which are generated by the interaction of PP and the world via its body. PP has no top-level program, fixed book of rules or algorithm. I emphasised the importance of the body as the means for a brain to interact with the world in my 1977 book Thinking with the Teachable Machine as follows (page 6):

Sometimes people think of their bodies as cages within which they live, but bodies are quite the opposite of cages. My body connects me to the world around me in such an intimate and interactive way that I become part of that world. I can participate in the dynamic society and culture of humanity. My cage is wide open. Only if we enable our machines to interact freely with the world around them can we release them from the cage of paradox. ... PURR-PUSS is unique, to my knowledge, in being a system that could be given a body-in-the-world.