| View previous topic :: View next topic |
| Author |
Message |
TimDaly Guest
|
Posted: Thu Jul 24, 2008 6:33 am Post subject: Literate Programming and Reproducible Results |
|
|
The Ecole Polytechnique Federale De Lausanne has introduced
an online journal fore reproducible research at:
<http://rr.epfl.ch/17/>
The introductory headline reads:
Have you ever tried to reproduce the results presented in a research
paper? For many of our current publications, this would unfortunately
be a challenging task. For a computational algorithm, details such as
the exact dataset, initialization or termination procedures and
precise
parameter values are often omitted in the publication for various
reasons. This makes it difficult, if not impossible, for someone else
to
obtain the same results. To address the problem, we have started
making our research reproducible. Instead of only describing the
developed algorithms to ‘sufficient’ precision in an article, we give
readers access to all the information (code, data, schemes, etc.)
that
was used to produce the presented results as first advocated by Knuth
and Claerbout. We are convinced that making research reproducible is
not only a matter of good practice, but also increases the impact of
our
publications and makes it easier to build upon each other’s work. It
is a
clear win-win situation for our community: we will have access to
more
and more algorithms and can spend time inventing new things rather
than recreating existing ones.
I am a firm believer in Knuth>s Literate Programming and in the need
to have full publication of the program as well as the research paper
in the field of computational mathematics (CM).
Too often in the CM area the only surviving artifact of an algorithm
is
a 5 page paper in a conference publication. There is no need to limit
publications to the constraint of paper. Conference proceedings can
be published electronically. With the adoption of some reasonable,
minimal standards it should be possible to "drag-and-drop" a paper
onto a running system and have it installed immediately. Indeed,
it could even occur at the conference while attending the talk which
would allow the audience to run the algorithm immediately.
Combining the research paper with the actual code will raise the
standard of publication and encourage standardization of the CM
algorithms. If, in addition, the publication license allowed others
to make derivative algorithms we could see a whole history of the
development of important results (say, new Groebner basis
enhancements) with a clear history of prior art.
Tim Daly |
|
| |
|
Back to top |
Dave Guest
|
Posted: Thu Jul 24, 2008 12:42 pm Post subject: Re: Literate Programming and Reproducible Results |
|
|
TimDaly wrote:
[quote]The Ecole Polytechnique Federale De Lausanne has introduced
an online journal fore reproducible research at:
http://rr.epfl.ch/17/
The introductory headline reads:
Have you ever tried to reproduce the results presented in a research
paper? For many of our current publications, this would unfortunately
be a challenging task. For a computational algorithm, details such as
the exact dataset, initialization or termination procedures and
precise
parameter values are often omitted in the publication for various
reasons. This makes it difficult, if not impossible, for someone else
to
obtain the same results. To address the problem, we have started
making our research reproducible.
[/quote]
<SNIP>
[quote]
Tim Daly
[/quote]
Nice idea I agree. Some things though are never going to be
reproducible, since they contain random processes.
Take the well known example of determining Pi from a Monte Carlo method
that works out if the x,y coordinate are inside a circle or not and uses
ratio of areas of circle and square to arrive at Pi. If two people do
that, they will get different results, since by its nature a random
number generator will give different results. Or instead do you give
someone the 'random' numbers you used? In that case, those numbers are
no longer random.
Or let me so experimental research on the sun. I build some kit to look
at light output from the Sun and get some data (intensity over the time
period of a week in June 2008 for example). How can anyone hope to
reproduce the same result exactly? If I simply give them the data I
collected, I don>t really consider that has achieved your goal. I could
for example have made it up as a fake. But since that data was collected
some time back, you can>t collect the same data yourself.
I applaud the principle, but in practice I think there are a lot of
cases where it simply impossible. |
|
| |
|
Back to top |
Nasser Abbasi Guest
|
Posted: Fri Jul 25, 2008 7:10 am Post subject: Re: Literate Programming and Reproducible Results |
|
|
"Dave" <foo@coo.com> wrote in message news:48883278@212.67.96.135...
[quote]
Nice idea I agree. Some things though are never going to be reproducible,
since they contain random processes.
[/quote]
But when talking about random processes, results are usually given in
statistical terms (averages, std. etc...) and so, the numbers generated from
one run do not have to be exactly the same. What is important, is the
probability distribution of the result itself and its statistics.
[quote]Take the well known example of determining Pi from a Monte Carlo method
that works out if the x,y coordinate are inside a circle or not and uses
ratio of areas of circle and square to arrive at Pi. If two people do
that, they will get different results, since by its nature a random number
generator will give different results.
[/quote]
Yes, but by repeating the experiment, and generating many samples, the
distribution should have statistics that should agree to some extent with
the published results in the paper.
[quote]
Or let me so experimental research on the sun. I build some kit to look at
light output from the Sun and get some data (intensity over the time
period of a week in June 2008 for example). How can anyone hope to
reproduce the same result exactly? If I simply give them the data I
collected, I don>t really consider that has achieved your goal. I could
for example have made it up as a fake. But since that data was collected
some time back, you can>t collect the same data yourself.
I applaud the principle, but in practice I think there are a lot of cases
where it simply impossible.
[/quote]
I do not think so. If the computation is deterministic, then we agree there
is no problem. If it is not, then the statistics is what is important, not
the actual sample values from just one run.
Nasser |
|
| |
|
Back to top |
TimDaly Guest
|
Posted: Fri Jul 25, 2008 2:16 pm Post subject: Re: Literate Programming and Reproducible Results |
|
|
On Jul 25, 1:17 am, "Nasser Abbasi" <n...@12000.org> wrote:
[quote]"Dave" <f...@coo.com> wrote in messagenews:48883278@212.67.96.135...
Nice idea I agree. Some things though are never going to be reproducible,
since they contain random processes.
But when talking about random processes, results are usually given in
statistical terms (averages, std. etc...) and so, the numbers generated from
one run do not have to be exactly the same. What is important, is the
probability distribution of the result itself and its statistics.
Take the well known example of determining Pi from a Monte Carlo method
that works out if the x,y coordinate are inside a circle or not and uses
ratio of areas of circle and square to arrive at Pi. If two people do
that, they will get different results, since by its nature a random number
generator will give different results.
Yes, but by repeating the experiment, and generating many samples, the
distribution should have statistics that should agree to some extent with
the published results in the paper.
Or let me so experimental research on the sun. I build some kit to look at
light output from the Sun and get some data (intensity over the time
period of a week in June 2008 for example). How can anyone hope to
reproduce the same result exactly? If I simply give them the data I
collected, I don>t really consider that has achieved your goal. I could
for example have made it up as a fake. But since that data was collected
some time back, you can>t collect the same data yourself.
I applaud the principle, but in practice I think there are a lot of cases
where it simply impossible.
I do not think so. If the computation is deterministic, then we agree there
is no problem. If it is not, then the statistics is what is important, not
the actual sample values from just one run.
Nasser
[/quote]
Nassar,
An excellent point.
I>ve discussed the idea of literate papers before with many
people at conferences. One key issue that is always raised
is that pre-tenure professsors get NO credit for code, only
for published papers, so no effort is given to writing code
for publication.
I don>t think we can attack this kind of a problem until
we raise the standard of publication in Computational
Mathematics to the point where publishing code is either
expected or required. If tenure committees gave added
weight to a paper with code it would change the behavior
overnight.
A second issue is that most people generate "ad-hoc" code
which they use to get the result but don>t feel is worthy
of publication, especially since there are no standards
to use. The code probably will not handle boundary cases
and will likely be badly organized.
This, again, speaks to the standard of publication issue.
What are the criteria for valid review criticism? Can we
complain about missing boundary cases? Failures in areas
not addressed by the paper? Bad coding style? I think
these criteria will evolve over time but we need early
examples. Open source projects help here.
There are many other issues that arise. Despite that, I
believe that it is important to the long term growth of
Computational Mathematics that we raise the standards
of publication to the point where results can be
reproduced by anyone other than the original authors.
Other sciences, such as Math, Physics, and Chemistry
require this level of publication and expect nothing
less.
A five page conference paper may be sufficient for a
mathematical proof but is clearly missing fundamental
information for a computational mathematics "proof".
Tim |
|
| |
|
Back to top |
Herman Rubin Guest
|
Posted: Sat Jul 26, 2008 5:00 pm Post subject: Re: Literate Programming and Reproducible Results |
|
|
In article <ba0217f7-7d82-4ce5-a722-39dfc089480c@j7g2000prm.googlegroups.com>,
TimDaly <daly@axiom-developer.org> wrote:
[quote]On Jul 25, 1:17=A0am, "Nasser Abbasi" <n...@12000.org> wrote:
"Dave" <f...@coo.com> wrote in messagenews:48883278@212.67.96.135...
Nice idea I agree. Some things though are never going to be reproducibl=
e,
since they contain random processes.
But when talking about random processes, results are usually given in
statistical terms (averages, std. etc...) and so, the numbers generated f=
rom
one run do not have to be exactly the same. What is important, is the
probability distribution of the result itself and its statistics.
[/quote]
It is, but it can be difficult to get much of a hold on it.
Also, often conclusions are drawn from a single simulation.
[quote]Take the well known example of determining Pi from a Monte Carlo method
that works out if the x,y coordinate are inside a circle or not and use=
s
ratio of areas of circle and square to arrive at Pi. If two people do
that, they will get different results, since by its nature a random num=
ber
generator will give different results.
Yes, but by repeating the experiment, and generating many samples, the
distribution should have statistics that should agree to some extent with
the published results in the paper.
[/quote]
This is NOT usually done. In straight Monte Carlo, as in
the above example, one does get a good idea about the
variability from a single run, but this is not the case
from the now heavily used MCMC.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558 |
|
| |
|
Back to top |
|