Friday, 29 March 2013

Operant Cocditioning

Operant conditioning can be defined as a type of learning in which voluntary (controllable; non-reflexive) behavior is strengthened if it is reinforced and weakened if it is punished (or not reinforced). 

Note: Skinner referred to this as Instrumental Conditioning/Learning 

A. The most prominent figure in the development and study of Operant Conditioning was B. F. Skinner 

 History: 

a) As an Undergraduate he was an English major, then decided to study Psychology in graduate school. 

b) Early in his career he believed much of behavior could be studied in a single, controlled environment (created Skinner box - address later). Instead of observing behavior in the natural world, he attempted to study behavior in a closed, controlled unit. This prevents any factors not under study from interfering with the study - as a result, Skinner could truly study behavior and specific factors that influence behavior. 

c) during the "cognitive revolution" that swept Psychology (discussed later), Skinner stuck to the position that behavior was not guided by inner force or cognition. This made him a "radical behaviorist". 

d) as his theories of Operant Conditioning developed, Skinner became passionate about social issues, such as free will, how they developed, why they developed, how they were propagated, etc. 


Skinner's views of Operant Conditioning 

a) Operant Conditioning is different from Classical Conditioning in that the behaviors studied in Classical Conditioning are reflexive (for example, salivating). However, the behaviors studied and governed by the principles of Operant Conditioning are non-reflexive (for example, gambling). So, compared to Classical Conditioning, Operant Conditioning attempts to predict non-reflexive, more complex behaviors, and the conditions in which they will occur. In addition, Operant Conditioning deals with behaviors that are performed so that the organism can obtain reinforcement. 

b) there are many factors involved in determining if an organism will engage in a behavior - just because there is food doesn't mean an organism will eat (time of day, last meal, etc.). SO, unlike classical conditioning...(go to "c", below) 

c) in Op. Cond., the organism has a lot of control. Just because a stimulus is presented, does not necessarily mean that an organism is going to react in any specific way. Instead, reinforcement is dependent on the organism's behavior. In other words, in order for an organism to receive some type of reinforcement, the organism must behave in a specific manner. For example, you can't win at a slot machine unless several things happen, most importantly, you pull the lever. Pulling the lever is a voluntary, non-reflexive behavior that must be exhibited before reinforcement (hopefully a jackpot) can be delivered. 

d) in classical conditioning, the controlling stimulus comes before the behavior. But in Operant Conditioning, the controlling stimulus comes after the behavior. If we look at Pavlov's meat powder example, you remember that the sound occurred (controlling stimulus), the dog salivated, and then the meat powder was delivered. With Operant conditioning, the sound would occur, then the dog would have to perform some behavior in order to get the meat powder as a reinforcement. (like making a dog sit to receive a bone). 

e) Skinner Box - This is a chamber in which Skinner placed animals such as rats and pigeons to study. The chamber contains either a lever or key that can be pressed in order to receive reinforcements such as food and water. 

* the Skinner Box created Free Operant Procedure - responses can be made and recorded continuously without the need to stop the experiment for the experimenter to record the responses made by the animal. 

f) Shaping - operant conditioning method for creating an entirely new behavior by using rewards to guide an organism toward a desired behavior (called Successive Approximations). In doing so, the organism is rewarded with each small advancement in the right direction. Once one appropriate behavior is made and rewarded, the organism is not reinforced again until they make a further advancement, then another and another until the organism is only rewarded once the entire behavior is performed. 


For Example, to get a rat to learn how to press a lever, the experimenter will use small rewards after each behavior that brings the rat toward pressing the lever. So, the rat is placed in the box. When it takes a step toward the lever, the experimenter will reinforce the behavior by presenting food or water in the dish (located next to or under the lever). Then, when the rat makes any additional behavior toward the lever, like standing in front of the lever, it is given reinforcement (note that the rat will no longer get a reward for just taking a single step in the direction of the lever). This continues until the rat reliably goes to the lever and presses it to receive reward. 


Principles of Reinforcement 

Skinner identified two types of reinforcing events - those in which a reward is given; and those in which something bad is removed. In either case, the point of reinforcement is to increase the frequency or probability of a response occurring again. 

1) positive reinforcement - give an organism a pleasant stimulus when the operant response is made. For example, a rat presses the lever (operant response) and it receives a treat (positive reinforcement) 

2) negative reinforcement - take away an unpleasant stimulus when the operant response is made. For example, stop shocking a rat when it presses the lever (yikes!) 


 Skinner also identified two types of reinforcers 

1) primary reinforcer - stimulus that naturally strengthens any response that precedes it (e.g., food, water, sex) without the need for any learning on the part of the organism. These reinforcers are naturally reinforcing. 

2) secondary/conditioned reinforcer - a previously neutral stimulus that acquires the ability to strengthen responses because the stimulus has been paired with a primary reinforcer. For example, an organism may become conditioned to the sound of food dispenser, which occurs after the operant response is made. Thus, the sound of the food dispenser becomes reinforcing. Notice the similarity to Classical Conditioning, with the exception that the behavior is voluntary and occurs before the presentation of a reinforcer. 


Schedules of Reinforcement 
There are two types of reinforcement schedules - continuous, and partial/intermittent (four subtypes of partial schedules) 

a) Fixed Ratio (FR) - reinforcement given after every N th responses, where N is the size of the ratio (i.e., a certain number of responses have to occur before getting reinforcement). 

For example - many factory workers are paid according to the number of some product they produce. A worker may get paid $10.00 for every 100 widgets he makes. This would be an example of an FR100 schedule. 

b) Variable Ratio (VR) - the variable ration schedule is the same as the FR except that the ratio varies, and is not stable like the FR schedule. Reinforcement is given after every N th response, but N is an average. 

For example - slot machines in casinos function on VR schedules (despite what many people believe about their "systems"). The slot machine is programmed to provide a "winner" every average N th response, such as every 75th lever pull on average. So, the slot machine may give a winner after 1 pull, then on the 190th pull, then on the 33rd pull, etc...just so long as it averages out to give a winner on average, every 75th pull. 

c) Fixed Interval (FI) - a designated amount of time must pass, and then a certain response must be made in order to get reinforcement. 

For example - when you wait for a bus example. The bus may run on a specific schedule, like it stops at the nearest location to you every 20 minutes. After one bus has stopped and left your bus stop, the timer resets so that the next one will arrive in 20 minutes. You must wait that amount of time for the bus to arrive and stop for you to get on it. 

d) Variable Interval (VI) - same as FI but now the time interval varies. 

For example - when you wait to get your mail. Your mail carrier may come to your house at approximately the same time each day. So, you go out and check at the approximate time the mail usually arrives, but there is no mail. You wait a little while and check, but no mail. This continues until some time has passed (a varied amount of time) and then you go out, check, and to your delight, there is mail. 


Punishment
Whereas reinforcement increases the probability of a response occurring again, the premise of punishment is to decrease the frequency or probability of a response occurring again. 

  • Skinner did not believe that punishment was as powerful a form of control as reinforcement, even though it is the so commonly used. Thus, it is not truly the opposite of reinforcement like he originally thought, and the effects are normally short-lived. 
  • there are two types of punishment: 

1) Positive - presentation of an aversive stimulus to decrease the probability of an operant response occurring again. For example, a child reaches for a cookie before dinner, and you slap his hand. 

2) Negative - the removal of a pleasant stimulus to decrease the probability of an operant response occurring again. For example, each time a child says a curse word, you remove one dollar from their piggy bank. 


 
Applications of Operant Conditioning 

a) In the Classroom 

Skinner thought that our education system was ineffective. He suggested that one teacher in a classroom could not teach many students adequately when each child learns at a different rate. He proposed using teaching machines (what we now call computers) that would allow each student to move at their own pace. The teaching machine would provide self-paced learning that gave immediate feedback, immediate reinforcement, identification of problem areas, etc., that a teacher could not possibly provide. 


b) In the Workplace 

Study by Pedalino & Gamboa (1974) - To help reduce the frequency of employee tardiness, the researchers implemented a game-like system for all employees that arrived on time. When an employee arrived on time, they were allowed to draw a card. Over the course of a 5-day workweek, the employee would have a full hand for poker. At the end of the week, the best hand won $20. This simple method reduced employee tardiness significantly and demonstrated the effectiveness of operant conditioning on humans.

No comments:

Post a Comment