Operant Conditioning
Operant conditioning is a form of learning in which behavior is modified by its consequences. Two types of consequences play a role in modifying behavior: reinforcement and punishment. Behaviors followed by reinforcement tend to be strengthened while those that are followed by punishment tend to be weakened.
Real-life examples of operant conditioning are not very hard to come by. If a child gets a lollipop for doing her homework, she will likely be motivated to do her homework again. If you receive a shock from sticking your finger in a socket, you will no doubt avoid making the same mistake twice. If you tell a joke and everyone doubles over in laughter, you will be inclined to tell the same joke again. But if no one laughs, or if your listeners actually frown, you will likely think twice before repeating that joke.
As the above examples show, the theory of operant conditioning helps to explain many aspects of day-to-day life. It has also proven to be a powerful tool for modifying the behavior of humans and animals. Before we delve into the core aspects of the theory, let's take a moment to consider how it first developed.
Operant conditioning theory was developed by B. F. Skinner who was an ardent proponent of the behaviorist movement in psychology. Behaviorism was pioneered by John B. Watson who insisted that the focus of psychology should be on overt, observable behaviors rather than private events (e.g. thoughts and emotions), which he believed cannot be objectively studied. Watson was greatly influenced by the work of Ivan Pavlov and believed that all behaviors are learned through interaction with the environment and can be explained by the theory of classical conditioning.
Although Skinner shared Watson's emphasis on observable behavior and environmental variables, he diverged from some of the ideas inherent in Watson's version of behaviorism (methodological behaviorism). Skinner advanced an alternative approach, which he called radical behaviorism. While methodological behaviorism focuses almost exclusively on observable stimuli and behavior, radical behaviorism takes a person's thoughts and emotions into account in addition to observable events.
As a radical behaviorist, Skinner felt that classical conditioning was limited in its ability to explain complex human behavior. In classical conditioning, learning occurs passively—a response is simply elicited by a particular stimulus. Skinner proposed, however, that organisms generally play a more active role in the learning process by exploring and influencing their environment and then evaluating the consequences of their behavior.
Skinner developed the theory of operant conditioning by building on the work of another behaviorist, E. L. Thorndike. Thorndike's Law of Effect states that behaviors that are followed by favorable results are more likely to be repeated than those that are followed by unfavorable consequences. Skinner adopted this idea and introduced the terms 'reinforcement' and 'punishment' to refer to favorable and unfavorable consequences respectively.
Beginning in the 1920s, Skinner conducted numerous experiments in order to explore and develop his theory. Many of his experiments involved the use of small animals such as rats and pigeons, as well as an enclosed chamber which came to be known as the 'Skinner box.' One version of the Skinner box was equipped with a lever that the animal could press in order to obtain food.
In a typical experiment, a hungry rat would be placed inside the box and would wander around randomly until it accidentally pressed the lever. A food pellet would immediately be delivered to the animal. After a few trials, the random exploration would decrease while the rat's tendency to press the lever would increase. Skinner reasoned that the food pellet served as a reinforcer of this behavior, causing it to be repeated. He called this form of learning operant conditioning since the organism directly influences, or operates on, the environment.
If you've sat through at least one lecture on operant conditioning, you would know by now that reinforcement and punishment are the basic building blocks of this theory. The likelihood of a behavior recurring depends on which of these consequences follows.
< Behavioral consequences can be broken down into four categories:
1. Positive Reinforcement
The frequency of a behavior is increased by the addition of a pleasant stimulus. Example: If after packing away all his toys (behavior), Tommy's mother gives him a scoop of his favorite ice cream (pleasant stimulus), Tommy will likely repeat this behavior in the future.
2. Negative Reinforcement
The frequency of a behavior is increased by the removal of an unpleasant stimulus. Example: If after packing away all his toys, Tommy's mother stops nagging him about it (unpleasant stimulus), he will be more inclined to pack away his toys in the future.
3. Positive Punishment
The frequency of a behavior is decreased by the addition of an unpleasant stimulus. Example: If Tommy deliberately goes to sleep without packing away his toys and his mother drags him out of bed in the middle of the night to do so (unpleasant stimulus), it is less likely that Tommy will repeat this behavior in the future.
4. Negative Punishment
The frequency of a behavior is decreased by the removal of a pleasant stimulus. Example: If Tommy deliberately goes to bed without packing away his toys and his mother then takes away his video game (pleasant stimulus), it is less likely that Tommy will do the same thing again.
It can sometimes be a bit tricky to differentiate between each of these consequences (especially when you're in the middle of a major exam and are keenly aware that time is running out!) It gets easier, though, if you take note of two important facts:
- The term 'positive' implies the addition or presentation of a stimulus following the behavior, while the term 'negative' implies the removal or withdrawal of a stimulus. (Here's a tip: If you associate the word 'positive' with the addition sign (+) and the word 'negative' with the subtraction sign (-) it's quite easy to remember whether the consequence involves the addition or removal of a stimulus).
- Reinforcement, whether positive or negative, always increases the frequency of a behavior. Punishment, whether positive or negative, always decreases it. (Don't fall into the trap of confusing negative reinforcement with punishment as many students do. The former strengthens behavior while the latter weakens it.)
Pleasant Stimulus | Aversive Stimulus | |
---|---|---|
Add (+) |
Positive Reinforcement (strengthens behavior) Jill gets a lollipop for completing her homework |
Positive Punishment
(weakens behavior) Jill gets a slap on the hand for pulling Abby's hair's |
Take Away (-) |
Negative Punishment (weakens behavior) Jill does not receive her usual allowance because she started a fight at school |
Negative Reinforcement (strengthens behavior) Jill does not have to do any chores because she did well on her test |
For those of you who like a challenge, here are a few more examples of operant conditioning. See if you can figure out the type of consequence highlighted in each case. (The answers can be found at the end of the article).
- Betty was caught sneaking out one night so her dad took her car keys away.
- You stop making dinner for your roommate because she always criticizes your cooking.
- Your professor announces that you can skip the dreaded final exam if you attend all the upcoming tutorials.
- Lucy found a beautiful silver ring in the bag of popcorn she purchased.
- You finally fasten your seatbelt to stop the annoying beeping sound.
- You unfortunately lost your scholarship because you failed one of your final exams (not the psych exam of course!).
There are several important concepts in operant conditioning besides reinforcement and punishment. You will notice that some of these also apply to classical conditioning (e.g., stimulus generalization and extinction) although the explanations are somewhat different.
Acquisition
This refers to the initial stages of the learning process when a new response or behavior is being developed. In operant conditioning, complex behaviors are often acquired gradually through the process of shaping.
Shaping
This involves training an organism to perform a behavior by reinforcing responses that increasingly resemble the target behavior. Let's say you want to teach a young child to color within the boundaries of a circle. At first, you might praise the child simply for making a mark on the paper, even if that mark is not made within the circle. As this response becomes more frequent, you might withhold praise until the child actually starts to color within the circle, even if he also colors outside of it. Finally, you wait until the child neatly colors the circle without going outside the lines before you offer praise.
Discriminative stimulus
In operant conditioning, this refers to a cue which indicates that a behavior is likely to be reinforced. For example, you might learn that your request for money is most likely to be met when your dad is in a good mood. Or a rat might learn that pressing a lever only produces food when a red light inside the box is on. The good mood of your father and the red light in the box serve as cues that the behavior in question is likely to be reinforced.
Stimulus discrimination
This occurs when an organism learns to produce a certain response in the presence of one stimulus but not another. For example, a rat might learn that pressing the lever results in food when a red light is on but not when a green light is on. Similarly, a child might throw a tantrum in order to get candy when he's in the presence of his mother but would not dare pull the same stunt in the presence of his dad.
Stimulus generalization
This occurs when an organism produces the same response in the presence of similar stimuli. For example, if a child is reinforced at school for saying "please" when making a request, and the child also starts saying "please" at home, the behavior has been generalized. Likewise, if a rat learned to press a lever when a red light is on, it might exhibit the same behavior when an amber light is turned on instead.
Extinction
In operant conditioning, this refers to the tendency for responses that are no longer reinforced to gradually weaken or be extinguished. If a rat is no longer given food pellets for pressing a lever, this response will gradually die out over time. If a child's tantrums are consistently ignored, this behavior might also cease.
Spontaneous recovery
This refers to the return of a response that was previously extinguished. For example, if after extinction a rat is given a period of rest and then returned to the Skinner box, it will automatically start pressing the lever again in an effort to obtain food (poor fellow).
In operant conditioning, reinforcement may be delivered according to different rules or schedules. Skinner identified two broad categories of reinforcement schedules: continuous and intermittent.
Continuous reinforcement
Reinforcement is provided every single time the desired behavior is performed. Examples include giving your dog a treat every time he fetches a ball, giving a child a sticker every time she scores 100% on a test, or becoming less thirsty every time you drink a glass of water. New behaviors are acquired most quickly through continuous reinforcement.
A major problem with this schedule of reinforcement is that the subject may eventually become bored or lose interest in the reinforcer. To prevent this from happening, different reinforcers can be introduced during the acquisition stage to keep the subject motivated. Once the target behavior has been learned, continuous reinforcement can be replaced with intermittent reinforcement since the latter has been shown to be more effective at maintaining behavior over an extended period of time.
Intermittent or partial reinforcement
Intermittent schedules provide reinforcement sometimes, not every time, the desired behavior is performed. There are four schedules for delivering intermittent reinforcement: fixed-ratio, fixed-interval, variable-ratio and variable-interval. Ratio schedules are based on number of responses while interval schedules are based on time.
- Fixed-ratio schedule - Reinforcement is provided after a fixed number of correct responses, for example giving your dog a treat every 4th time he fetches a ball.
- Variable-ratio schedule - The number of correct responses required for reinforcement varies around an average. For example, you might reward your dog on average every 4th time he fetches a ball, first after two responses, then after four and finally after six (the average of two, four and six is four).
- Fixed-interval schedule - Reinforcement is provided for the first correct response after a fixed interval of time has passed. Let's say you decide on a 60-second fixed interval schedule for reinforcing your dog's behavior. You would reward the dog the first time it fetches a ball, but not again until after 60 seconds have passed. This means that for 60 seconds, no reinforcement will be available to your dog even if he responds correctly during that time. However, you would reward the first correct response made by your dog after the 60-second time period has expired. Once reinforcement is provided, another 60 seconds must pass before reinforcement becomes available again.
- Variable-interval schedule - The amount of time that must elapse before a correct response is reinforced varies around an average. For example, you might give your dog a treat for responding appropriately after an interval of 60 seconds on average, first after 40 seconds then after 50, 70 and 80 seconds.
Behaviors that are reinforced on an intermittent schedule are more resistant to extinction than those that are reinforced on a continuous basis. The reason for this is clear. If you are used to being rewarded on an irregular basis it will take a longer time for you to discern when the reinforcements have stopped completely than if you are accustomed to being rewarded for every single response. You will therefore persist in the behavior for a longer period of time.
Variable schedules are also more resistant to extinction than fixed schedules simply because they are unpredictable. Since you never know when the next reinforcement is due you will continue to respond for a longer period of time in the hope of receiving another reward. Behaviors that are maintained on a variable ratio schedule are the most resistant to extinction.
The benefits of variable schedules over fixed schedules can be seen in a number of different settings. Slot machines, for example, provide reinforcement on a variable ratio schedule—you never know how many pulls of the handle are required before there is a payout. If people were able to predict how many pulls are needed for them to hit the jackpot (fixed ratio), casino owners would probably go out of business quite quickly. Similarly, if workers know the exact date when their supervisor will be coming to inspect their work and issue bonus payments (fixed interval), they would no doubt work harder as the inspection time approaches but might very well slack off at other times during the year. If bonus payments are made on a variable schedule, they would produce more consistent work since they know that their supervisor could show up at any time.
Punishment can be quite effective in discouraging inappropriate behavior—the first time a child touches a hot iron might very well be the last. Nevertheless, psychologists often raise several concerns regarding the use of punishment:
- Punishment does not eliminate inappropriate behavior, it merely suppresses it. When the punishment (or threat of it) is removed, the behavior often returns. For example, you might slow down along a section of the highway where you previously received a speeding ticket, but still drive above the speed limit along other sections of the roadway.
- Punishment tells you what not to do (e.g., don't shout when you want the teacher's attention) but does not necessarily suggest a more acceptable form of behavior (e.g., raise your hand if you want the teacher's attention).
- Punishment, particularly corporal punishment, may lead to aggression, fear and anxiety on the part of the recipient.
- Punishment is most effective when it quickly follows undesirable behavior but it is not always possible to deliver punishment immediately.
Behavior modification
This is a form of behavioral therapy designed to increase or strengthen desirable behaviors and decrease or eliminate undesirable ones. Many behavior modification techniques are based on the principles of operant conditioning. One such technique is the use of a token economy. This is a reward system in which tokens (e.g. poker chips, points, gold stars) are awarded when the desired behavior is performed and can later be exchanged for rewards. Token economies and the technique of shaping are frequently employed in educational and psychiatric settings to teach appropriate behaviors.
Animal training
If you've ever seen a monkey riding a bicycle, a dog skilfully maneuvering an obstacle course, or a dolphin jumping through a hoop, you would have witnessed firsthand the amazing power of operant conditioning. Animal trainers rely heavily on the principles of this theory (especially reinforcement and shaping) to teach their animal friends impressive tricks.
For example, as smart as dolphins are, no sensible trainer would expect them to jump through a hoop right off the bat. That behavior would have to be shaped. The trainer might start by first reinforcing a movement toward the hoop. Later, he might reinforce jumping out of the water, and finally jumping through the hoop.
Superstitions
Superstitious behaviors are often acquired and maintained through operant conditioning. If a gambler blows on a set of dice and then rolls a winning number, his behavior is reinforced even though it has absolutely nothing to do with the outcome of the roll. If by chance this occurs a few more times, he might conclude that his behavior (blowing on the dice) actually produces the desired consequence (a winning throw) when in fact the behavior is unnecessary and ineffectual.
Addiction
The principles of operant conditioning can also account for various forms of addiction, including drug and gambling addiction. Drug use is positively reinforced by the pleasant feelings it produces, motivating the individual to keep repeating the behavior. Negative reinforcement also plays a role since drug use provides an escape from the stresses of life and from the withdrawal symptoms associated with prolonged use.
Gambling can also be quite addictive since it is maintained on a variable ratio schedule—the gambler expects some of his efforts to be rewarded but he never knows when the reward will come. As we learned before, variable ratio schedules are the most resistant to extinction so they can maintain behaviors for a long period of time. A gambler who experiences an early win might therefore continue to repeat this behavior day after day, week after week, month after month, always in anticipation of the next elusive win.
Operant conditioning accounts for many aspects of human behavior but like classical conditioning it does not provide a complete picture of the learning process. For one thing, Skinner discounted the role of mental and cognitive processes in learning. He also suggested that reinforcement is essential for learning to occur. However, other studies suggest that in some cases, learning is primarily a cognitive exercise and can take place even in the absence of reinforcement.
For example, both humans and animals are known to learn by insight—a sudden understanding of how to solve a problem. (If you've ever felt like a light bulb just went off in your head you would know what insight learning involves). We also learn quite a lot by simply observing and imitating others. In psychology, this is known as social learning.
Another limitation of operant conditioning is that it is largely based on the results of animal studies which some critics argue cannot be generalized to humans. In addition to the fact that animals differ from humans in their biological makeup and cognitive abilities, animal studies ignore the social context in which human behavior and learning occur.
Answers to self test: 1) negative punishment, 2) positive punishment, 3) negative reinforcement, 4) positive reinforcement, 5) negative reinforcement, 6) negative punishment
1. Cartwright, J. (2002). Determinants of animal behaviour. New York: Routledge.
2. Coon, D., & Mitterer, J. O. (2010). Introduction to psychology: Gateways to mind and behavior (12th ed.). Belmont, CA: Wadsworth.
3. Coon, D., Mitterer, J. O., & Martini, T. (2018). Psychology: Modules for active learning (14th ed.). Boston, MA: Cengage Learning.
4. Eysenck, M. W. (2004). Psychology: An international perspective. New York: Psychology Press.
5. Pastorino, E., & Doyle-Portillo, S. (2012). What is psychology? Belmont, CA: Wadsworth.
6.Rathus, S. A. (2013). Psychology: Concepts and connections, brief version. Belmont, CA: Wadsworth.