What is Positive Training, and Why Should We Use It?
by Lisbeth Plant KPACTP
In pet dog training, teaching owners positive training, that is to say, to deliver rewards for desirable behaviour – is significantly easier and safer than trying to teach them to administer punishment for undesirable behaviour. In either case, mistakes are bound to occur, but while an untimely reward is simply an extra reward that will make little difference to the final result, an untimely punishment can cause severe setbacks to the training process and may also affect the animal’s overall performance, behaviour, and attitude to training and to his owner/handler.
Click here to learn about Punishment and the Dog Training Instructor , Problems with Punishment and Saying “No” to Your Dog.
Every Day Terminology
In daily use, “positive” training is often used as an abbreviation for “positive reinforcement” training and it is regarded as the opposite of “correction-based” training. In between the two, we sometimes also talk about “balanced” training.
In positive training, we catch the animal doing something right and reward that action, in order to improve the chances that the behaviour happens again (positive reinforcement). When a mistake happens, we ignore it the action itself and then make sure to remove any inherent reward, like for example clearing the kitchen counters to cure counter-surfing (negative punishment).
Correction-based training is exactly what it sounds like. When the animal makes a mistake, we correct (punish) the mistake. The intention is that by teaching the animal what is wrong, the animal will eventually learn to do the right thing. Sometimes, an aversive stimulus (“punishment”) is inflicted on the animal even before the animal has done anything at all, such as in the example of the force-based retrieve.
In “balanced training“, the principle is a combination of the two; when the animal does right, we reward, and when the animal makes a mistake, we punish (“correct”). To most of us, that sounds like quite a reasonable course of action at first. However, it can be fraught with problems.
It is doubtful that reinforcement and punishment are perceived as equal by the learners. A teacher that punishes a student, even by a simple sneer, for providing a wrong answer, is making that student less likely to want to offer answers in the future. Compare that to a teacher that concentrates on rewarding not only correct answers, but also attempts. That teacher will have much more willing learners, and in fact will also encourage learning in a climate that is free of the risk of “failure”.
When we concentrate on rewarding correct behaviours, and simply remove the rewards associated with incorrect behaviours, i.e. we avoid any and all aversive (unpleasant) consequences, the laws of learning(1) tell us that the strength of the rewarded behaviours will increase, while unrewarded behaviours will regress naturally. The challenge for the average dog owner is of course to identify what the reward might be in each case of undesirable behaviour.
A pleasant side-effect of reward-based training is the strong bond of mutual respect, understanding and affection that develops between trainer and learner. The learner associates all those rewards not only with his own behaviour, but with the presence of the trainer.
Training without aversives can be a powerful tool for training complicated behaviours and behaviour chains, because the cue for one behaviour, if trained strictly without aversives stimuli of any kind, will function as a reward for the previous behaviour.(2,3) This way, an obedience competitor can build up an entire behaviour chain of the trial sequence, continuously rewarding the dog in the ring with one cue after another until the final heel-out-of-the-ring, where the reward is given. The competitor has in fact continuously ‘rewarded the dog in the ring’ during the trial!
This only works if the cue for the subsequent behaviour has no association with any kinds of aversives. This is because, unlike the “commands” used in correction-based or balanced training, which implicate “do this or else”, the “cues” used in positive reinforcement training constitute opportunities for reinforcement. This makes the cue itself a valuable commodity that the animal learns to pay attention to for his own interests.
By using purely positive reinforcement, without any kinds of aversives, you will therefore teach your animal not only to do what you want, but also to be attentive and long for you to ask for the behaviours.
Reliability of Performance (Stimulus Control)
You often hear statements like “positive training is all very well for pets, but for competition, I need my dog to be 100% reliable, so I have to correct him so he knows not to blow me off.” This statement illustrates two misconceptions:
First, it has actually been shown that, all other things being equal, you tend to get higher reliability if you train purely with positive methods(4,5), while if you use any kind of aversive (physical, or even simple no-reward-markers such as a “no!”), you will create what is called a ‘poisoned cue’.
Karen Pryor describes the process of the creation of a poisoned cue in her 2002 article “The Poisoned Cue: Positive and Negative Discriminative Stimuli” as follows [“stimulus” refers to a cue, or a command]:
“Even if the behavior was trained entirely with positive reinforcement, if one now clicks for correct behavior following a discriminator (a cue, command, or signal) but also gives aversive correction (leash pop, verbal reprimand, etc.) for incorrect behavior following that same stimulus, the stimulus immediately loses its value as a positive reinforcer. It is, at best, ambiguous in terms of reinforcement. It is not a click. It no longer automatically triggers the positive emotions associated with conditioned positive reinforcers. It can no longer be predictably used inside a chain to reinforce previous behavior.
Even if primary reinforcers, such as approval, toys, and treats are supplied in abundance during or after training or performance, the discriminative stimuli themselves-the commands-are now threats as well as promises. Behavior tends to break down, interestingly, both preceding and following these ambivalent stimuli: preceding, because the preceding behavior may begin to extinguish due to lack of a positive conditioned reinforcer consisting of the now-aversive stimulus, and following, because the behavior that might be punished tends to be avoided. The shift becomes visible in the learner’s attitude, which switches from attentive eagerness to reluctance, often with visible manifestations of stress. Even though successful response to a given discriminative stimulus is still followed by reward, if failure is now followed by punishment, you have made that discriminative stimulus ambiguous in terms of predictable outcome. It is no longer ‘safe.’ You have poisoned your cue.”(6)