David W. Croft
CNS 124B
Pattern Recognition
California Institute of Technology
1994 June 03
The purpose of this half of the class project was to construct the framework for a finite state machine that would learn temporal patterns. The specific patterns to be modeled were supplied by my partner from his designs of self-timing circuits using threshold (neural) devices. State transitions for his circuits were supplied to me via a connectionist diagram of his model or by a sample graph of time-varying states. The goal of my efforts was to construct a computer algorithm that would learn to emulate his models by sampling their states throughout time.
The architecture for this device was a fully-interconnected neural network (including self-connections) with a propagation delay of a user-selected constant. The neuron was a non-linear, non-stationary model similar to the "integrate-and-fire" circuit. The modification to this model was that the neuron, instead of undergoing a refractory period after spiking, would "tire" by becoming negative (hyperpolarized) with respect to its normal (resting) output. See Figure 1. The learning rule, a Hebbian variant, used the hyperpolarization of the neuron after firing as phase information sufficient to avoid excitatory saturation and to learn inhibitory (negative) synaptic connections.
_______ | _ | Voltage I <-- *----*----*-----*-----|__| |__|--* | | | | |_______| +1| __|__ | \ S <---/ | _/| C _____ \ | | _/ | | / Slow = 0|___/....|....._ | \ Leak === - | | _/ | / = E | | _/ | \ R === + -1| |/ | | | ---------------> *____*__________* Time | === =Figure 1. The Integrate and "Tire" Neuron Model
Synaptic weights between neurons had both an excitatory and an inhibitory component that were trained independently according to the following learning rules:
dW_dt.Dep = -Learn_Rate * ( Weight.I_Dep / Weight.Dep ) * ( Vm - Threshold )
dW_dt.Shu = +Learn_Rate * ( Weight.I_Shu / Weight.Shu ) * ( Vm - Threshold ).
The term "(Vm -Threshold)" indicates that learning is a function of membrane voltage (Vm) of the soma shifted by a learning threshold. Generally, for excitatory weights, the weight became potentiated (increased) if the synapse was active while the voltage was above the learning threshold and became depressed (decreased) when the voltage was below. I tried setting the learning threshold to zero, small values, and high values. I found that a zero value caused undesirable learning for small perturbations of Vm around its resting value. For a high value, the weights tended to be depressed. I found that the best performance was given for a small, positive threshold.
I also tried replacing the term (Vm - Threshold) with a non-linear term such as (exp(+Vm - Threshold) - exp (-Vm - Threshold)) to get a learning rate that would be small when them post-synaptic membrane voltage hovered around the learning threshold but large when it was depolarized (firing) or hyperpolarized. Although the decrease in learning around threshold helped, the exponentially large increases at the tail-ends caused undesirable saturation effects. Another function, combining the advantage of a small "centered" learning but also linear "wide" learning, may be suitable.
"Dep" stands for a depolarizing, excitatory synaptic weight and "Shu" stands for a shunting ("silent") inhibitory synaptic weight. Since the shunting, or "silent", inhibition does not bring the neuron below its resting value but rather "shorts" it to rest (ground) from either depolarized or hyperpolarized levels, the only time a neuron using shunting, as opposed to hyperpolarizing, inhibition would be hyperpolarized (negative) is right after it fired an action potential (spike). This provides temporal information as to whether a synaptic input is arriving at the soma (neuron body) immediately after an action potential usually indicating the arrival of an input that is anti-correlated with respect to the inputs that caused the neuron to fire.
Since I had problems with numerical integration in that a hyperpolarized soma receiving a strong shunting and a weak excitatory input would tend to overshoot the shunt value and cause the neuron to fire, I switched to hyperpolarizing inhibition wherein overshoot would not be a problem. As the learning rule dictates that inhibitory weights become larger in response to a hyperpolarized soma, this move proved disastrous in that small hyperpolarizations would eventually cause saturation in the inhibitory weights. While this was acceptable for synapses that should be strongly inhibitory (inhibition tended to be an all-or-nothing affair), many synapses that should have had weights of zero became inhibitory. I then switched back to shunting inhibition and modified my integration routine for synaptic currents.
Weights (conductances) were not allowed to fade to zero but instead maintained at or above some minimum value. An effective zero-weight could be obtained, though, since the synaptic current from a pre-synaptic neuron was the result of both the excitatory weight and the inhibitory weight. Shunting inhibitory weights tended to "divide" the excitatory weights while hyperpolarizing inhibitory weights tended to subtract from the excitatory weights.
The term "( Weight.I_Shu / Weight.Shu )" shows that the weight change is a function of the synaptic current flowing out of the soma through that particular synapse. Thus, when the synaptic current was zero, no learning occurred for that particular weight. That is, when the pre-synaptic neuron was inactive and the post-synaptic neuron was active, no learning would occur.
Synaptic current was a function of the conductance whose value varied in time from 0 to a maximum given by the learned weight. Neurotransmitter input was modeled by a delta pulse to a synaptic conductance function modeled by an exponential decay function or an alpha (damped ramp) function. While I tried many different functions for the time-course of the synaptic conductance, I found that the exponential decay function delivered the most reliable results with regard to the desired avoidance of excitatory saturation. The time course of the conductance was crucial. If it were a wide function, the pre-synaptic input would be smeared over a long period of time and could cause the weights to become correlated with temporally distant inputs. If it were too short, learning would only detect inputs that arrived within a narrow window of the firing of the soma. I found that the mass of the conductance needed to encompass both the depolarization and hyperpolarization of the soma during an action potential but should trail off rapidly afterwards so as not to become correlated with subsequent action potentials.
During the time-course of an action potential with a stable weight, the weight is potentiated during the depolarization phase and depressed during the hyperpolarization phase. The learning rule prevents excitatory saturation in that excessively large weights cause the soma to rapidly depolarize causing the balance of potentiation and depression to shift to the now increased period of depression during the hyperpolarization phase. Thus, excitatory weights tend to increase to the point where the neuron just fires and then the weight hovers about that point. This seems to be similar to my recent reading of Oja's Rule for Hebbian learning of linear neurodes wherein the learning rate is something proportional to ( Vm - Vm**2 ). When Vm is less than one, the weight increases; when it is greater than one, the weight decreases. Note that this anti-saturation effect for spiking neurons requires that the synaptic conductivity be active for portions of both the depolarization and the hyperpolarization. Thus, short-lived synaptic conductance functions may only catch the depolarization phase would cause excitatory saturation.
Additionally, using excitatory weights alone, I was able to observe resistance to timing-skew of a soma firing regular action potentials for a given periodic input. Since potentiation of the weight tended to increase firing frequency and depression would decrease firing frequency, an increase in the weight past the stable value would tend to bring upon an anti-saturation effect which would cause the weight to come back down. Thus, for a stable periodic input, firing frequency stabilized. There may be some relation here to Klopf's Drive Reinforcement learning rules wherein weights adjust themselves to stabilize the output firing frequency.
Using shunting inhibition, inhibitory saturation was avoided since large values of inhibition tended to bring Vm away from hyperpolarized values back to the resting potential where learning was minimal. In general, I noted that inhibitory weights did not increase until they began to compete against a strong excitatory input.
While Oja's Rule for linear neurons seems to be similar to mine in that saturation is avoided, is may not encompass the phase information given in a non-stationary spiking neuron. Phase information was used to detect correlations and anti-correlations in time of the firings of action potentials. For example, when an excitatory synaptic conductance was active during depolarization of the soma, the weight was increased. However, if neurotransmitter from the pre-synaptic neuron arrived during the hyperpolarization phase of the soma, the excitatory weights were decreased and inhibitory weights were increased to provide a measure of the anti-correlation of the inputs.
For this project, I clamped the neurons to values specified by the training examples. Using a process I have nicknamed Inhibition and Clamped Excitation (ICE), I clamped the neurons to depolarized values (positive) when the training file indicated that the corresponding state in the system to be emulated was "on". Thus, any other neurons, or states, that were firing one propagation delay prior will cause a synaptic conductance coincident with the forced excitation. This triggers excitatory potentiation. When the corresponding state was "off", I left the neuron floating at its present value unless it attempted to fire under the influence of learned excitatory weights.
If one or more inputs tended to excite a neuron but, as indicated by the training file, occasionally should not, there must be some information available to the neuron via an inhibitory input that allows it to distinguish the different desired outputs. For this purpose, when a neuron fired but should not have done so, I immediately inhibited the neuron to a resting or hyperpolarized value. This caused the excitatory weights from all active pre-synaptic neurons to decrease and inhibitory weights to increase. For example, if a neuron is depolarized when the first of two inputs is active, its weight will become more excitatory. However, if the neuron is always artificially inhibited when both inputs are present, the weights will become more inhibitory. The long run effect is that the weight from the first input will swing both up and down while the weight from the second input will only go down (or negative). Thus, the second weight will become inhibitory and suppress firing whenever both inputs are present but the neuron will still fire under the excitatory influence of the first weight alone. I observed that inhibitory weights did not form until the excitatory weights were of sufficient strength to cause the soma to fire. After that point, inhibition tended to pace the excitatory weights including stabilizing when they did. The excitatory weights, in turn, tended to experience slow growth under the influence of occasional "punishing" artificial inhibition until the inhibitory weights were of sufficient strength to correctly counteract excitation and suppress a mis-timed action potential.
For the training system data, I constructed a "spike raster" file which showed the values of the states and their transitions over time. See Figure 2.
D D D D D D D I D D D D D D D I D D D D D D D I D D D D D D D D D D D D D D I D D D D D D D IFigure 2. An Example "Spike Raster"
In the figure, time moves from left to right and each line represents a different state. The symbol 'D' indicates the neuron is on at that time, a space indicates that it is off, and 'I' re-initializes the neuron. In the bottom three lines (neurons 3, 4, and 5), a "wave" can be seen in that the neurons are firing in a recognizable pattern. Neuron 3 causes neuron 4 to fire which causes neuron 5 to fire, then 4, 3, 4, 5, 4, 3, etc. Note that while neuron 3 always excites neuron 4 and neuron 5 always excites neuron 4 as well, neuron 4 causes neuron 3 to fire only half of the time and neuron 5 in the other half. This is due to well-timed inhibition of neurons 3 and 5 during every other excitatory input from neuron 4. As can be seen, information about when to become inhibited is available from neurons 1 and 2 which have been conveniently trained to inhibit a neuron two time steps after that same neuron excited them. Thus, through clamped excitation and punishing inhibition, I was able to build this state-holding machine of excitation and inhibition.
Using similar spike rasters constructed from the emulated self-timing circuits, I was able to achieve limited success for small circuits. This, however, required a clear understanding of the modeled system. For example, if sufficient information was not present in the spike raster, the neural network learned false correlations that would not be present if a larger sample size were taken. To counteract these effects, I would occasionally clamp neurons to rest (an 'R' in the training file) so as to effectively disconnect them from other neurons. For example, input neurons, which are normally controlled by the outside world, would undesirably learn to excite each other if they tended to be simultaneously triggered often.
While I had some success for small circuits and other interesting temporal patterns of my own design using ICE, I began to consider the possibility of learning temporal patterns where the hidden, or inner, states are not given in the training file. In the example in Figure 2 for the wave, we might consider neurons 3, 4, and 5 as both input and output neurons with neurons 1 and 2 being state-holding "hidden", "middle", or interneurons. Thus, the inputs and outputs would be clamped while the hidden neurons would turn on and off freely.
Initial efforts to allow the hidden neurons to float were unsuccessful in that free neurons tended to simply learn to fire synchronously with the first input. As I desired to use an unsupervised Hebbian learning rule for these neurons, I was unable to provide direct information as to the error in their spiking patterns.
I have since then considered using variable propagation delays for neurons. From observations, the phase property of a hyperpolarizing neuron usually led the synaptic connection of a neuron to itself (an "autapse") to become negative (or zero in the case of excitatory weights alone) since a neurotransmitter pulse released during the depolarization phase usually returned after the propagation delay during the hyperpolarization phase causing the weight to become more inhibitory to itself. This self-inhibition, by the way, is an alternate method of establishing a refractory period or, with some models with a slightly different learning rule and hyperpolarizing inhibition, of generating the post-action potential hyperpolarization. However, if the propagation delay were sufficiently long, the neurotransmitter pulse would arrive when the neuron was firing a second action potential. This would cause the weight to become more excitatory and eventually lead to a neuron which, once initially excited, would continue to fire until inhibited. I used this feature to model those states in the circuits which would turn on and then stay on for some length of time.
I then theorized that, by using a range of propagation delays between neurons, one could generate a range of weights due to the phase-shifted effects of timing correlations. This would require a neural network model in which each neuron has several connections of different lengths to every other neuron and itself. Functions generated by a few of the weights which were well correlated to the desired (clamped) firing of the output neuron would become established whereas poorly correlated functions would have their generating weights bounce semi-randomly about the minimum value. This is the type of unsupervised phase-delay learning may be biologically similar to the formation of ocular dominance columns during early development.
Another idea for this type of hyperpolarizing neuron learning rule is to use random noise to train the interneurons similar to a Boltzmann machine. About a year ago, I used random noise to train a fully-interconnected network consisting of spiking tri-state neurodes (firing, resting, tired = hyperpolarized) to learn the XOR function with only the inputs and outputs clamped to the desired values. When the hidden weights hit upon values which generated desired outputs, noise was reduced. This may similar to certain forms of behavioral operant conditioning wherein some destabilizing "random noise" such as a drive (pain, hunger) is reduced after a successfully rewarded behavior. In fact, one might go further by saying that the above network demonstrated "exploratory" behavior until it chanced upon the successful XOR function. This concept could be extended to the analog, spiking, integrate and "tire" neurons that I used in this project.
In conclusion, in attempting to build a neural network architecture capable of emulating some arbitrary time-varying state-machine, I discovered and considered a number of difficulties in using non-linear, non-stationary neurons with a Hebbian learning rule. For simple boolean state-machines where the input, output, and hidden states can be observed, ICE can be used to clamp and coerce a neural network to model that system. For state-machines with hidden states, further development is needed.