Slot Machines Operate On A ____ Reinforcement Schedule

Posted on  by 

  • This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another.
  • If you win every time you put a quarter in a slot machine, you keep playing. You win 4 times in a row (consistent reinforcement). On the 5th, 6th, 7th try, nothing happens. You start to realize that the machine is no longer lucky (stopped paying out), so you go to a new machine.

With the slot machine, we never know when we're going to win, but we know we won't win if we stop pulling that handle. That's what keeps us playing in the hopes of hitting the jackpot. Fixed interval reinforcement is like your paycheck because you go to work every day, and on a schedule, you're rewarded with a sum of money; whereas a variable.

Positive reinforcement, using food rewards to increase the likelihood a dog will repeat a desirable behavior, is universally regarded as the most reliable method for teaching commands. While the basic concepts of rewardbased training are easy to understand, people sometimes inadvertently inhibit progress by using too many—or too few —treats.

Let’s say you’re in Las Vegas playing a slot machine, but every time you deposit a quarter and pull the arm, you get your one quarter in return. This wouldn’t keep your attention for long, and you’d probably opt for a different machine.

Now, what if you started feeding your hard-earned quarters into the next machine, but for hours on end got none back? Chances are you’d become equally frustrated and end your short gambling career.

Applied to dog training, both of these extremes—continuous reinforcement or none at all—can lead to lower command compliance.

GET THE BARK IN YOUR INBOX!

Sign up for our newsletter and stay in the know.

“My dog will only sit if I have a treat.” Over the years, I have heard this refrain many times, and it almost always indicates that the dog was rewarded with treats for sitting on cue too often and for too long. Essentially, the dog had learned two things had to be true for him to comply: the sit cue plus a treat. If either were not true, he’d find something more interesting to do.

When initially teaching a new command, “Continuous Reinforcement”—CR in the geeky learning-theory world— is the most effective approach. For instance, when first teaching a puppy to sit, rewarding each successful completion (or “trial”) makes sense because your focus is on clearly pairing the verbal cue and hand gesture with the behavior: put the quarter in (your puppy sits on cue) and the reward appears (treat!).

But acting as your puppy’s loose slot machine for too long causes him to stop working so hard. Why bother sitting quickly, or at all, when a treat invariably appears? CR for too long also causes the dog to become dependent on the food reward: he will refuse to work unless food is presented. Before you get to that point—usually within a few days of teaching a new cue —it’s time to move to a less predictable reinforcement schedule.

Back to the gambling analogy. Once you’re sure your dog has a grasp on what you’re teaching him, it’s time to become a fair and honest slot machine, dispensing small food rewards less frequently for successful trials. (This is also a good time to find soft treats that won’t easily crumble to bits, and to always have a few hidden in your pocket.)

The psychology behind slots— enticing folks to pump coins into machines for hours on end—is that the probability of winning remains constant, even though the number of plays it takes to recoup your money, or better yet, hit the jackpot, changes. The unpredictability makes doing the same mundane activity, over and over, interesting and exciting. You can take advantage of this same psychology to train your dog faster.

When teaching your dog a new command, once you’ve determined that he knows what you’re expecting from him, begin randomly rewarding successful trials using “Variable Ratio” (VR) reinforcement. Start with a low ratio, rewarding roughly one out of every three trials, then increase the ratio over the course of several training sessions.

For example, when teaching your puppy to sit, provide a small treat for (successful) trials 2, 7, 9, 15, 18, 19, 20, 23 and 25. Notice that during 25 trials, sometimes he gets three rewards in a row, but sometimes, there’s a longer lag between treats. The idea is to keep him guessing—and working!

Over the course of twice-daily training sessions (two to five minutes each), increase the ratio until he is rewarded for roughly one out of every ten successful trials. The behavior should become a happy habit by then, although, to keep commands fresh, continue to occasionally reward your dog for life. In other words, don’t become the slot machine that never pays a jackpot!

There are other types of reinforcement schedules too involved for our purposes here, but one to take advantage of is “Differential Reinforcement of Excellent Behavior” or DRE. This is just a fancy way of saying “better performance earns bigger rewards.” Once you’ve worked through Continuous Reinforcement (treating every time to teach the command) and Variable Ratio (treating randomly to hone the behavior), you can polish the command by handsomely rewarding only the best trials.

Let’s think about DRE in terms of teaching recalls. Once your dog is largely responding to your “come” command, and you’ve worked through Variable Ratio reinforcement—by sometimes treating and sometimes not— start rewarding with higher-value treats, or more of what you have, only when your dog immediately and enthusiastically answers your call. If he stops and smells the roses (or whatever that was) en route, no reward is given.

Advancing through these levels is not rigid, and you may combine aspects of more than one as you progress. Be ready to back up a step if you’ve moved too fast—your dog will let you know!

The Rat in Your Slot Machine: Reinforcement Schedules

Slot Machines Operate On A __ Reinforcement Schedule 2020

Blog || Politics || Philosophy || Science || Fiction || Quotes


When gamblers tug at the lever of a slot machine, it is programmed to reward them just often enough and in just the right amount so as to reinforce the lever-pulling behavior - to keep them putting money in. Its effect is so powerful that it even overrides the conscious knowledge most players have that in the long run, the machines are programmed to make net profit off of customers, not give money out.Machines

Slot machine designers know a lot about human behavior, and how it is influenced by experience (learning). They are required by law to give out on average a certain percentage of the amount put in over time (say 90% payout), but the schedule on which a slot machine's reinforcement is delivered is very carefully programmed in and planned (mainly small and somewhat randomly interspersed payoffs). Interestingly, this effective type of reinforcement schedule originally comes from studies with non-human animals.

When you put rats in a box with a lever, you can set up various contingencies such that pressing the lever releases food to them. You could release food based on a fixed ratio of lever presses (every 10 presses drops some food), or a fixed interval (fifteen seconds must elapse since the last lever press before a new lever press will release food). Alternately, you could do it based on a variable ratio of presses (on average, it will take 10 presses to get food, sometimes more, sometimes less), or a variable interval (on average, food is available for pressing a lever every 15 seconds, but sometimes you have to wait longer, sometimes not as long).

A variable ratio schedule is perhaps the most interesting for the example of slot machines. If you make food available on a variable ratio, you can make sure food is given out often enough that the task remains interesting (i.e. the rat doesn't totally give up on pressing the lever), and you can also make it impossible for the rat to guess exactly when reward is coming (so it won't sit there and count to 10 lever presses and expect food; or it won't sit and wait 15 seconds before pressing the lever). Indeed, since the rat only knows it is somewhere in the range of when a reward might come, but doesn't know exactly on which press it is coming, the rat ends up pressing the lever over and over quite steadily. Other reinforcement schedules do not produce as consistent a pattern of behavior (the response curve is not nearly as steep or consistent).

Slot Machines Operate On A __ Reinforcement Schedule

Slot machine designers learned that lesson well and applied it to humans, for whom the same responses appear given a particular reward contingency. By providing payoffs on a variable ratio schedule, they give out money just often enough that people keep playing, and because it happens on average every X times, rather than exactly every X times, the players cannot anticipate when reward is coming (in which case they won't not bother playing when it was not coming). It is possible that any response could be reinforced, so they are less likely to give up. It keeps them in the seat the longest, tugging that lever repeatedly because it always feels like they are on the verge of getting paid off.

Slot Machines Operate On A ____ Reinforcement Schedule

Slot Machines Operate On A __ Reinforcement Schedule Printable

The lesson here is not just meant for gamblers. Our modern life is so full of coercive techniques aimed at controlling our behavior (based on principles of learning and conditioning like those mentioned above) that we have come to expect no less. We recognize that television commercials use tricks to convince us to buy products. We recognize that speech writers and marketing/P-R firms perform careful studies to determine how language and word choice contributes to supporting or extinguishing a behavior. These things still affect our behavior, but recognizing coercive techniques is one of our few defenses to avoiding their invisible pull. And so it is worth it for all of us to pick up a little knowledge about the field of learning and behavior analysis, to better understand how our own behavior is conditioned that we might take back as much control as possible.

Originally Written: 01-25-07
Last Updated: 01-25-07

Slot Machines Operate On A __ Reinforcement Schedule 2019

Coments are closed