| View previous topic :: View next topic |
| Author |
Message |
baylor Guest
|
Posted: Wed Oct 29, 2003 3:38 am Post subject: Matching Law - what happens during learning? |
|
|
The matching law says that we distribute our answers based on the
relative response rate. The few studies i>ve read on this talk about
talents that are fully formed, like pushing a lever or a trained
athlete shooting a basket. From what i>ve read, the matching law holds
up over some time period well after the training/learning phase
Anyone know what happens with regards to action selection and the
matching law when someone is improving an ability (ie, response rate
and relative response rate are increasing)?
-baylor |
|
| |
|
Back to top |
Glen M. Sizemore Guest
|
Posted: Wed Oct 29, 2003 5:28 pm Post subject: Re: Matching Law - what happens during learning? |
|
|
Answers to this question may be difficult to find in the literature, since
most of the experimental analysis of behavior is concerned with
steady-state, or at least stable-state responding. You might be able to
email someone who has worked in this area for a long time. They may have
noticed if there is anything reliable about the approach to the
stable-state.
BTW, why are you interested in this topic?
Cordially,
Glen
"baylor" <baylor@no_spam.ihatebaylor.com> wrote in message
news:n6otpv40fg7elng93i6lai11aehj5h3n7c@4ax.com...
[quote]The matching law says that we distribute our answers based on the
relative response rate. The few studies i>ve read on this talk about
talents that are fully formed, like pushing a lever or a trained
athlete shooting a basket. From what i>ve read, the matching law holds
up over some time period well after the training/learning phase
Anyone know what happens with regards to action selection and the
matching law when someone is improving an ability (ie, response rate
and relative response rate are increasing)?
-baylor[/quote] |
|
| |
|
Back to top |
baylor Guest
|
Posted: Wed Oct 29, 2003 10:07 pm Post subject: Re: Matching Law - what happens during learning? |
|
|
"Glen M. Sizemore" wrote:
[quote]BTW, why are you interested in this topic?
[/quote]
Video games :)
Flipping through my book on behavior and learning, i ran across a
study of how basketball players decide between making two point and
three point shots. Turns out, when reviewing shot statistics for a
season, the shots made obeyed the matching law
In video games, most agents pick one tactic and stick with it. An
exception was Baldur>s Gate where the designers plucked random numbers
out of the air to decide what percentage of the time an agent would
use spell #1 and what percentage of the time they>d use spell #2. Note
that this assumes, of course, that there are two options that work
roughly equally well in the given context
It seemed to me that i could make more realistic agents if they
selected from actions based on the matching law. The issue comes with
skill aquisition - as an agent gets better at something (increased
relative rate of reinforcement), surely their willingness to use that
option should increase (increased relative rate of responding). But if
an option never worked when one began (hadn>t practiced), it would be
squeezed out. Even if that agent practiced that skill, the matching
law wouldn>t allow for them to use it. i think
To give an example, say you play the coach of a basketball team and
one person never makes a three point shot. You have the option of
telling him to spend the summer practicing that shot. At the end, that
player is pretty good at it. My expectation is that, when the newly
confident player starts the next season, he would shoot a much higher
percentage of three pointers than is predicted by the matching law.
But the code i wrote, in its current form, doesn>t allow for that. i
can force certain things to happen because, hey, it>s my code, but i>m
trying to find a more psychologically appealing way of doing this,
especially since this is more for research purposes than actual game
purposes
-baylor |
|
| |
|
Back to top |
Glen M. Sizemore Guest
|
Posted: Thu Oct 30, 2003 6:25 pm Post subject: Re: Matching Law - what happens during learning? |
|
|
[quote]BTW, why are you interested in this topic?
[/quote]
B: Video games :)
Flipping through my book on behavior and learning, i ran across a
study of how basketball players decide between making two point and
three point shots. Turns out, when reviewing shot statistics for a
season, the shots made obeyed the matching law
In video games, most agents pick one tactic and stick with it. An
exception was Baldur>s Gate where the designers plucked random numbers
out of the air to decide what percentage of the time an agent would
use spell #1 and what percentage of the time they>d use spell #2. Note
that this assumes, of course, that there are two options that work
roughly equally well in the given context
GS: Does it?
B: It seemed to me that i could make more realistic agents if they
selected from actions based on the matching law.
GS: Very clever.
B: The issue comes with
skill aquisition - as an agent gets better at something (increased
relative rate of reinforcement), surely their willingness to use that
option should increase (increased relative rate of responding). But if
an option never worked when one began (hadn>t practiced), it would be
squeezed out. Even if that agent practiced that skill, the matching
law wouldn>t allow for them to use it. i think
GS: Once again, this is good thinking. And it is also exactly what happens
when alternatives are available on concurrent variable-ratio schedules. The
vast majority of papers on the matching law involve concurrent
variable-interval schedules. Under such schedules, the probability of
reinforcement on one alternative increases while the animal is responding on
the other alternative. Thus, if the animal makes occasional switches, the
response is likely to be reinforced. This insures that any change in
preference isn>t immediately amplified in the positive feedback loop you
described.
Still, much of what you describe is realistic. How many behaviors are there
that we never learn because what we have learned is moderately effective?
But, of course, if the environment changes and one>s behavior results in
relatively sparse reinforcement, the response class begins to undergo
extinction, and other responses begin to be emitted.
B: To give an example, say you play the coach of a basketball team and
one person never makes a three point shot. You have the option of
telling him to spend the summer practicing that shot. At the end, that
player is pretty good at it. My expectation is that, when the newly
confident player starts the next season, he would shoot a much higher
percentage of three pointers than is predicted by the matching law.
But the code i wrote, in its current form, doesn>t allow for that. i
can force certain things to happen because, hey, it>s my code, but i>m
trying to find a more psychologically appealing way of doing this,
especially since this is more for research purposes than actual game
purposes
GS: Well, the matching law doesn>t really make a prediction here because you
have added something to the situation (i.e., a period of time when the
3-pointer is the only one available - of course this assumes that reinforcer
is making the shot in practice as well as the game). Remember, all the
matching law states is that behavior will stabilize according to the formula
r1/r2 = R1/R2, or one of the various other forms of the matching law (that
allow for deviations from matching!). You are asking questions about the
dynamics of schedule-controlled behavior that have been addressed since the
mid-thirties when Skinner discovered the effects of intermittent
reinforcement. Now, 70 years later, we are still struggling with these
issues. See, no one knows why schedules of reinforcement (concurrent and
otherwise) produce the stable states that they do. We do not know which
variables interact in what ways such that behavior reaches some reasonably
stable state. I am writing a very, very, long paper on this topic. But,
basically, we have not been able to come up with a good mathematical model
of schedule-controlled behavior in general because 1.) the subject matter is
incredibly complex, and 2.) behavior analysts have naively tried to
determine the functions relating some variable like rate of reinforcement to
rate of responding in a directly empirical manner. One learns a great deal
about behavior when one does this, and one gets a good idea about what the
important variables are, but one cannot obtain the functions in question as
phenomenological laws - they must be deduced and tested. Anyway, like I
said, I>m writing a very long paper on this.
Good luck. I suggest that you start by reading the entire history of the
experimental analysis of behavior from the '50s on :) Just Google "jeab" and
follow the link to the Journal of the Experimental Analysis of Behavior.
There should be one or two papers on schedule-controlled behavior ;)
"baylor" <baylor@ihatebaylor.com> wrote in message |
|
| |
|
Back to top |
Michael Olea Guest
|
Posted: Wed Nov 05, 2003 4:12 am Post subject: Re: Matching Law - what happens during learning? |
|
|
in article n6otpv40fg7elng93i6lai11aehj5h3n7c@4ax.com, baylor at
baylor@no_spam.ihatebaylor.com wrote on 10/28/03 1:38 PM:
[quote]The matching law says that we distribute our answers based on the
relative response rate. The few studies i>ve read on this talk about
talents that are fully formed, like pushing a lever or a trained
athlete shooting a basket. From what i>ve read, the matching law holds
up over some time period well after the training/learning phase
Anyone know what happens with regards to action selection and the
matching law when someone is improving an ability (ie, response rate
and relative response rate are increasing)?
-baylor
[/quote]
There is a very readable article: "Bayesian Modeling of Visual Perception",
by Mamassian, Landy, and Maloney, in the book: "Probabilistic Models of the
Brain: Perception and Neural Function", edited by Rao, Olshausen, and
Lewicki, that is germain. The are talking primarily about visual
psychophysics, but the framework they apply, Bayesian Decision Theory, has
broader applicability; and they summarize it quite clearly. Here is an
excerpt:
The Bayesian framework allows us to model the consequences of gains
losses on behavior and to represent performance in different tasks
in a common framework so long as the effect of a change of task can
be modeled as a change in possible gains and losses....
One important property of the decision rules we have discussed so
far is that the same action will be chosen whenever the same stimulus
S is seen. This must be contrasted with the variable behaviour of any
biological organism. There are at least two approaches to modelling
this variability. The first is to recognize that we have neglected
to model sources of noise in the visual system...
The second appproach is to abandon the Bayes rule as a realistic
decision rule. For instance, Mamassian & landy[11] have modeled
human response variability with what they termed a "non-commiting"
rule. According to this non-deterministic rule, an action is
chosen with a probability that matches its posterior probability.
Actions with high posterior probabilities are selected more often
than those with low posterior probabilities, but any action may
potentially be chosen. This decision is also known as "probability
matching[13] and is often observed in human and animal choice behavior
when the gain function is the Delta gain function described above. It
is important to note that this rule is not optimal (the MAP rule
always leads to higher gain). Even though the MAP Rule is optimal
in this case, humans and other animals persist in probability
matching to a remarkable degree. In fact, it might be a better
strategy for an animal since it allows exploration of the state
space for learning (for an introduction to learning by exploration
see [16].
[16] Sutton, R. S. & Barto, A. G. (1998). "Reinforcement Learning:
An Introduction", Cambridge, MA: MIT Press
Unfortunately I have not read "Reinforcement Learning" - just don>t have
the money to but it right now, but I looked it over on amazon.com - would
buy it in a minute if I could. It looks like just what you want.... |
|
| |
|
Back to top |
Glen M. Sizemore Guest
|
Posted: Wed Nov 05, 2003 8:40 am Post subject: Re: Matching Law - what happens during learning? |
|
|
<snip>
MO: The Bayesian framework allows us to model the consequences of gains
losses on behavior and to represent performance in different tasks
in a common framework so long as the effect of a change of task can
be modeled as a change in possible gains and losses....
<snip>
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per reinforcer is
equivalent to the number of response per reinforcer obtained under the VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
Glen
"Michael Olea" <oleaj@sbcglobal.net> wrote in message
news:BBCD6435.F08%oleaj@sbcglobal.net...
[quote]in article n6otpv40fg7elng93i6lai11aehj5h3n7c@4ax.com, baylor at
baylor@no_spam.ihatebaylor.com wrote on 10/28/03 1:38 PM:
The matching law says that we distribute our answers based on the
relative response rate. The few studies i>ve read on this talk about
talents that are fully formed, like pushing a lever or a trained
athlete shooting a basket. From what i>ve read, the matching law holds
up over some time period well after the training/learning phase
Anyone know what happens with regards to action selection and the
matching law when someone is improving an ability (ie, response rate
and relative response rate are increasing)?
-baylor
There is a very readable article: "Bayesian Modeling of Visual
Perception",
by Mamassian, Landy, and Maloney, in the book: "Probabilistic Models of
the
Brain: Perception and Neural Function", edited by Rao, Olshausen, and
Lewicki, that is germain. The are talking primarily about visual
psychophysics, but the framework they apply, Bayesian Decision Theory, has
broader applicability; and they summarize it quite clearly. Here is an
excerpt:
The Bayesian framework allows us to model the consequences of gains
losses on behavior and to represent performance in different tasks
in a common framework so long as the effect of a change of task can
be modeled as a change in possible gains and losses....
One important property of the decision rules we have discussed so
far is that the same action will be chosen whenever the same stimulus
S is seen. This must be contrasted with the variable behaviour of any
biological organism. There are at least two approaches to modelling
this variability. The first is to recognize that we have neglected
to model sources of noise in the visual system...
The second appproach is to abandon the Bayes rule as a realistic
decision rule. For instance, Mamassian & landy[11] have modeled
human response variability with what they termed a "non-commiting"
rule. According to this non-deterministic rule, an action is
chosen with a probability that matches its posterior probability.
Actions with high posterior probabilities are selected more often
than those with low posterior probabilities, but any action may
potentially be chosen. This decision is also known as "probability
matching[13] and is often observed in human and animal choice behavior
when the gain function is the Delta gain function described above. It
is important to note that this rule is not optimal (the MAP rule
always leads to higher gain). Even though the MAP Rule is optimal
in this case, humans and other animals persist in probability
matching to a remarkable degree. In fact, it might be a better
strategy for an animal since it allows exploration of the state
space for learning (for an introduction to learning by exploration
see [16].
[16] Sutton, R. S. & Barto, A. G. (1998). "Reinforcement Learning:
An Introduction", Cambridge, MA: MIT Press
Unfortunately I have not read "Reinforcement Learning" - just don>t have
the money to but it right now, but I looked it over on amazon.com - would
buy it in a minute if I could. It looks like just what you want....
[/quote] |
|
| |
|
Back to top |
Michael Olea Guest
|
Posted: Wed Nov 05, 2003 10:22 am Post subject: Re: Matching Law - what happens during learning? |
|
|
in article 5eec36fdb4f1b8d55d1ab2493c143b7b@news.teranews.com, Glen M.
Sizemore at gmsizemore2@yahoo.com wrote on 11/4/03 6:40 PM:
[quote]snip
MO: The Bayesian framework allows us to model the consequences of gains
losses on behavior and to represent performance in different tasks
in a common framework so long as the effect of a change of task can
be modeled as a change in possible gains and losses....
snip
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per reinforcer is
equivalent to the number of response per reinforcer obtained under the VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
[/quote]
RTFM
[quote]
Glen
[/quote]
Cheers - Moron Olea |
|
| |
|
Back to top |
Michael Olea Guest
|
Posted: Thu Nov 06, 2003 3:56 am Post subject: Re: Matching Law - what happens during learning? |
|
|
in article 5eec36fdb4f1b8d55d1ab2493c143b7b@news.teranews.com, Glen M.
Sizemore at gmsizemore2@yahoo.com wrote on 11/4/03 6:40 PM:
[quote]
snip
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per reinforcer is
equivalent to the number of response per reinforcer obtained under the VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
[/quote]
Ok, I>ll take a stab at it. I am assuming that the variable-interval
schedule is one in which reinforcers are doled out independently of the
pigeon>s bevaior. In brief, then, under the first scenario, a
variable-interval schedule, any correlation between keypecking and
reinforcement is at chance levels - but not zero. Therefore the
posterior probability that a keypeck will result in a reinforcement
event (sufficiently close in time to the pecking event to allow
the formation of an association) is low. Given that a keypeck incurs
a cost, and is a poor predictor of a gain, the keypecking rate,
probability matching predicts, would be low.
In the second scenario, a variable ratio schedule, there is a relatively
high correlation between keypecks and reinforcement events; even though
"the average number of responses required per reinforcer is equivalent to
the number of response per reinforcer obtained under the VI 120 s schedule,
the expected gain per keypeck has gone up (information theory 101), hence
the "rate of responding will increase considerably".
It>s not "my" Bayesian framework, but yes, it can model the situation and
predict the results.
Now, lets get quasi-quantitative.
Would you mind describing the VI 120 s schedule in more detail? For now
I will assume that it is a Gaussian distribution with 120 s mean, but
we still need a variance. Lets suppose we have that info, and we can
model the distribution as N(mu, sigma). Can we now model the first
stage of the experiment, and make quantitative prediction about the
distribution of keypeck events? Not yet. We need a cost term, and a
gain term - what does a keypeck cost a pigeon, and what is a "reinforcer"
worth to it? But I have more basic questions - what basis do we have
to predict, in the VI stage of the experiment, that pigeons will emmit
keypecks at all? Were these pigeons engaged in their first ever encounter
with a standard operant chamber? Why did you define keypecks as
constituting a "response", and not, say, eye-blinks, or summersaults?
None of these behaviors had anything to do with reinforcement, so
all of them are equally worthy of being labled responses, and equally
useless predictors of reinforcement events. I>ll engage in a little
wild speculation here and guess that eye-blinks cost the pigeons less
than keypecks, which in turn cost less than summersaults. On that
basis I might be tempted to predict that if in the VI experiment you
tabulated the distributions of all 3 "responses" you would find they
all followed a distribution of the same shape as the VI schedule, but
that the eye-blink distribution had a higher mean than the keypeck
distribution, which had a higher mean than the summersault distribution.
I would be tempted, that is, if it did not neglect the "prior" term in
the Bayesian model. So I am going to reformulate the model a little bit
to include terms for the prior distribution of the probabilities that
a) blinking, b) pecking, c) summersaulting, d) none of the above, behavior
elicits from the pigeon>s world a reinforcer. Now, we don>t know what
these pigeons have been through prior to this experiment. Say they are
unsullied, unjaded newborns. The blank slate hypothesis would amount to a
uniform prior distribution, a, b, c, and d equally likely. A "nativist"
hypothesis would amount to a non-uniform prior. A little "just-so"
story-telling might run like this: Not only is the shape of the pigeon>s
beak subject, via natural selection, to adaptive preassures, but so is
its behavioral repertoir - in the natural environments under which pigeons
have evolved, pecking behavior has a higher probability of being associated
with obtaining food than do blinking or summersaulting behaviors - therefore
a pigeon is more likely, a priori, to explore its environment in search of
food via pecking rathern than blinking, summersaulting, or cooing, or
flame-warring. Further, a pigeon will more readily form associations
between pecking and obtaining food than between a host of other behaviors.
Maybe "cooing" was not such a good choice - could be that in the nest cooing
does have a positive correlation with eliciting food from the pigeon>s
world.
Back to the VI experiment. A uniform-prior with equal cost terms, and
random correlations between behavoir and reinforcement, predicts, under the
probability matching model, equal relatively low response rates for each of
blinking, pecking, summersaulting, cooing, and flame-warring. Wild
speculation: that>s not what happpens. Extravagant prediction: the rates of
eye-blinking, summersaulting, cooing, and flame-warring are not correlated
with the mean and standard deviation of the reinforcement rate; the
rate of key-pecking is correlated with the mean and standard deviation of
the reinforcement rate. Moronic basis of the prediction: the priors are not
uniform (nor the costs equal) - pigeons are born peckers; they are born
cooers too, but they are (hypothesis, here) geneticaly predisposed to be
more likely to form associations between pecking and eating, and to use
pecking to explore their culinary world (a form of hypothesis testing on the
pigeons part).
Do you have distribution data for the pecking response to the VI schedule?
If so, is the distribution well fit by a distribution of the same form as,
though perhaps with different parameters than, the distribution of the VI
schedule? What about blinking and cooing distributions?
So, on to the variable ratio phase. I don>t doubt that pigeons can be
trained via such a schedule of reinforcement to coo, blink, or peck for
food. But, according to the probability matching model, if the prior
probabilities of pigeons to associate cooing, blinking, and pecking, with
food, differ, then the number of trials before equilibrium distributions
of the response rates set in will also differ. What actualy happens?
[quote]
Glen
---------[/quote]
Michael |
|
| |
|
Back to top |
Glen M. Sizemore Guest
|
Posted: Thu Nov 06, 2003 3:59 am Post subject: Re: Matching Law - what happens during learning? |
|
|
That>s what I thought.
G.
"Michael Olea" <oleaj@sbcglobal.net> wrote in message
news:BBCDBB20.F15%oleaj@sbcglobal.net...
[quote]in article 5eec36fdb4f1b8d55d1ab2493c143b7b@news.teranews.com, Glen M.
Sizemore at gmsizemore2@yahoo.com wrote on 11/4/03 6:40 PM:
snip
MO: The Bayesian framework allows us to model the consequences of gains
losses on behavior and to represent performance in different tasks
in a common framework so long as the effect of a change of task can
be modeled as a change in possible gains and losses....
snip
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per
reinforcer is
equivalent to the number of response per reinforcer obtained under the
VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change
of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
RTFM
Glen
Cheers - Moron Olea
[/quote] |
|
| |
|
Back to top |
Glen M. Sizemore Guest
|
Posted: Thu Nov 06, 2003 5:15 am Post subject: Re: Matching Law - what happens during learning? |
|
|
[quote]
snip
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per reinforcer
is
equivalent to the number of response per reinforcer obtained under the VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
[/quote]
MO: Ok, I>ll take a stab at it. I am assuming that the variable-interval
schedule is one in which reinforcers are doled out independently of the
pigeon>s bevaior.
GS: Wrong. But let>s see if anything you say makes sense anyway. BTW, an
interval schedule is one in which reinforcer delivery depends on the
occurrence of a response (a member of a particular response class, more
precisely), after a period of time since the last reinforcer has passed. If
the time varies from reinforcer delivery to reinforcer delivery, the
schedule is a variable interval schedule (otherwise it is a fixed interval
schedule).
MO: In brief, then, under the first scenario, a
variable-interval schedule, any correlation between keypecking and
reinforcement is at chance levels - but not zero. Therefore the
posterior probability that a keypeck will result in a reinforcement
event (sufficiently close in time to the pecking event to allow
the formation of an association) is low. Given that a keypeck incurs
a cost, and is a poor predictor of a gain, the keypecking rate,
probability matching predicts, would be low.
GS: Exactly. But this is 35 year old stuff.....at least. Plus, if you give
it a "temporal conjunction" spin, then you could say that the prediction of
low rates on a response-independent schedule would be implicit in Skinner>s
original view of dependency as an "arranger of temporal contiguity." I am
glad, though, that your......innovative approach is such a mathematical boon
to the quantitative analysis of schedule-controlled behavior.
MO: In the second scenario, a variable ratio schedule, there is a relatively
high correlation between keypecks and reinforcement events; even though
"the average number of responses required per reinforcer is equivalent tothe
number of response per reinforcer obtained under the VI 120 s schedule,
the expected gain per keypeck has gone up (information theory 101), hence
the "rate of responding will increase considerably".
It>s not "my" Bayesian framework, but yes, it can model the situation and
predict the results.
Now, lets get quasi-quantitative.
GS: No...let>s take a moment to point out that, so far, your description is,
at least 35 years old and is, so far, equivalent to a position outlined 35
years ago.
MO: Would you mind describing the VI 120 s schedule in more detail? For now
I will assume that it is a Gaussian distribution with 120 s mean, but
we still need a variance.
GS: The VI I have in mind is a (you can get more details by consulting JEAB)
is a constant probability schedule. That is, it closely approximates a
schedule in which a probability generator is checked every n sec.
MO: Lets suppose we have that info, and we can
model the distribution as N(mu, sigma). Can we now model the first
stage of the experiment, and make quantitative prediction about the
distribution of keypeck events?
<snip>
GS: At this point, I>ll break off and allow you to reformulate your
mathematics based on the definition of variable-interval schedules (i.e.,
they are not response independent). But that>s probably not necessary; most
of what you say appears to be little more than a recognition that the
relation between behavior and its consequences is important and can be given
a particular mathematical form. What you miss, of course, is that a lot of
guys that are a lot smarter than you have been working on this issue and
pretty much said everything that you can say.
Cordially,
G.
"Michael Olea" <oleaj@sbcglobal.net> wrote in message
news:BBCEB208.F25%oleaj@sbcglobal.net...
[quote]in article 5eec36fdb4f1b8d55d1ab2493c143b7b@news.teranews.com, Glen M.
Sizemore at gmsizemore2@yahoo.com wrote on 11/4/03 6:40 PM:
snip
GS: If I maintain steady-state (stable-state?) responding (pigeons,
keypecks) under a variable-interval (VI) 120 s schedule, in a standard
operant chamber, and then switch the schedule to a variable-ratio (VR)
schedule in which the average number of responses required per
reinforcer is
equivalent to the number of response per reinforcer obtained under the
VI
120 s schedule, rate of responding will increase considerably (perhaps
180-290%). Does your "Bayesian framework allow...us to model the
consequences of gains losses on behavior and to represent performance in
different tasks in a common framework so long as the effect of a change
of
task can be modeled as a change in possible gains and losses...." have
anything to say here?
Ok, I>ll take a stab at it. I am assuming that the variable-interval
schedule is one in which reinforcers are doled out independently of the
pigeon>s bevaior. In brief, then, under the first scenario, a
variable-interval schedule, any correlation between keypecking and
reinforcement is at chance levels - but not zero. Therefore the
posterior probability that a keypeck will result in a reinforcement
event (sufficiently close in time to the pecking event to allow
the formation of an association) is low. Given that a keypeck incurs
a cost, and is a poor predictor of a gain, the keypecking rate,
probability matching predicts, would be low.
In the second scenario, a variable ratio schedule, there is a relatively
high correlation between keypecks and reinforcement events; even though
"the average number of responses required per reinforcer is equivalent to
the number of response per reinforcer obtained under the VI 120 s
schedule,
the expected gain per keypeck has gone up (information theory 101), hence
the "rate of responding will increase considerably".
It>s not "my" Bayesian framework, but yes, it can model the situation and
predict the results.
Now, lets get quasi-quantitative.
Would you mind describing the VI 120 s schedule in more detail? For now
I will assume that it is a Gaussian distribution with 120 s mean, but
we still need a variance. Lets suppose we have that info, and we can
model the distribution as N(mu, sigma). Can we now model the first
stage of the experiment, and make quantitative prediction about the
distribution of keypeck events? Not yet. We need a cost term, and a
gain term - what does a keypeck cost a pigeon, and what is a "reinforcer"
worth to it? But I have more basic questions - what basis do we have
to predict, in the VI stage of the experiment, that pigeons will emmit
keypecks at all? Were these pigeons engaged in their first ever encounter
with a standard operant chamber? Why did you define keypecks as
constituting a "response", and not, say, eye-blinks, or summersaults?
None of these behaviors had anything to do with reinforcement, so
all of them are equally worthy of being labled responses, and equally
useless predictors of reinforcement events. I>ll engage in a little
wild speculation here and guess that eye-blinks cost the pigeons less
than keypecks, which in turn cost less than summersaults. On that
basis I might be tempted to predict that if in the VI experiment you
tabulated the distributions of all 3 "responses" you would find they
all followed a distribution of the same shape as the VI schedule, but
that the eye-blink distribution had a higher mean than the keypeck
distribution, which had a higher mean than the summersault distribution.
I would be tempted, that is, if it did not neglect the "prior" term in
the Bayesian model. So I am going to reformulate the model a little bit
to include terms for the prior distribution of the probabilities that
a) blinking, b) pecking, c) summersaulting, d) none of the above, behavior
elicits from the pigeon>s world a reinforcer. Now, we don>t know what
these pigeons have been through prior to this experiment. Say they are
unsullied, unjaded newborns. The blank slate hypothesis would amount to a
uniform prior distribution, a, b, c, and d equally likely. A "nativist"
hypothesis would amount to a non-uniform prior. A little "just-so"
story-telling might run like this: Not only is the shape of the pigeon>s
beak subject, via natural selection, to adaptive preassures, but so is
its behavioral repertoir - in the natural environments under which pigeons
have evolved, pecking behavior has a higher probability of being
associated
with obtaining food than do blinking or summersaulting behaviors -
therefore
a pigeon is more likely, a priori, to explore its environment in search of
food via pecking rathern than blinking, summersaulting, or cooing, or
flame-warring. Further, a pigeon will more readily form associations
between pecking and obtaining food than between a host of other behaviors.
Maybe "cooing" was not such a good choice - could be that in the nest
cooing
does have a positive correlation with eliciting food from the pigeon>s
world.
Back to the VI experiment. A uniform-prior with equal cost terms, and
random correlations between behavoir and reinforcement, predicts, under
the
probability matching model, equal relatively low response rates for each
of
blinking, pecking, summersaulting, cooing, and flame-warring. Wild
speculation: that>s not what happpens. Extravagant prediction: the rates
of
eye-blinking, summersaulting, cooing, and flame-warring are not correlated
with the mean and standard deviation of the reinforcement rate; the
rate of key-pecking is correlated with the mean and standard deviation of
the reinforcement rate. Moronic basis of the prediction: the priors are
not
uniform (nor the costs equal) - pigeons are born peckers; they are born
cooers too, but they are (hypothesis, here) geneticaly predisposed to be
more likely to form associations between pecking and eating, and to use
pecking to explore their culinary world (a form of hypothesis testing on
the
pigeons part).
Do you have distribution data for the pecking response to the VI schedule?
If so, is the distribution well fit by a distribution of the same form as,
though perhaps with different parameters than, the distribution of the VI
schedule? What about blinking and cooing distributions?
So, on to the variable ratio phase. I don>t doubt that pigeons can be
trained via such a schedule of reinforcement to coo, blink, or peck for
food. But, according to the probability matching model, if the prior
probabilities of pigeons to associate cooing, blinking, and pecking, with
food, differ, then the number of trials before equilibrium distributions
of the response rates set in will also differ. What actualy happens?
Glen
---------
Michael
[/quote] |
|
| |
|
Back to top |
|