Lectue 05 - The Distribution of Random Variables
Meta
Key Topics
Binomial distribution Gaussian distribution Hypothesis testing ggplot2 moments Normality testing nortest Poisson distribution R Random variables Statistical significance stats
Resources
Open on Open on View Lecture Equations Functions Lab 04 Lab 04 Replication Problem Set 02
Lecture Slides
Grappling with p-values
Here are a couple of links for digging further into what p-values are and how we explain them:
- American Statistical Association’s Statement on p-values
- ``Mission Improbable: A Concise and Precise Definition of P-Value” - Science Magazine interview with Victor De Gruttola
- ``Science Isn’t Broken” - FiveThirtyEight article on p-values
- ``Not Even Scientists Can Easily Explain P-values” - FiveThirtyEight article on p-values, includes the video I mentioned in class
- Andrew Gelman’s views on p-values
- ``Abandon Statistical Significance” - Andrew Gelman and others
- Gelman and John Carlin suggest some solutions for the ``p-value communication problem”
- ``It will be much harder to call new findings ‘significant’ if this team gets its way” - Science Magazine article on changing culture around p-values
Histograms with the Normal Distribution Overlaid in ggplot2
I introduced the idea of a histogram with a normal distribution overlaid during this week’s lecture, but purposely did not include the syntax on our quick reference sheet for R
functions. If you are interested in playing with the syntax, it is included below. We make one key change to the initial ggplot()
call - the aesthetic mapping is included in the initial ggplot()
function rather than in a specific geom. We also covert the hwy
variable’s representation from frequency to density, and layer on top of this our normal distribution by using the stat_function()
function. Make sure to update the arguments for stat_function()
if you want to adjust this example, and note that we have to specify the data frame and the variable name separated by a $
.
library(ggplot2)
ggplot(data = mpg, mapping = aes(hwy)) +
geom_histogram(mapping = aes(y = ..density..)) +
stat_function(fun = dnorm, color = "red",
args=list(mean=mean(mpg$hwy), sd=sd(mpg$hwy)))
Performing Calculations
During class this week, I briefly described how to perform calculations and save results in R
. You can use standard mathematical operators +
, -
, /
, and *
to add, subtract, divide, and multiply (respectively). These are reviewed in the week-02-lecture-03-rQuickref.pdf
file in the Week-02
repo on GitHub. Quickly, we can calculate a value like lambda within R
, save it as an object, and reference the object in later calculations:
> lambda <- 100000 * .00004
> dpois(6, lambda = lambda)
[1] 0.1041956
Reading in Arbitrary Vectors to R
Problem Set 04 asks you to find the variance of an arbitrary set of values in a vector (list) of numbers. These can easily be read into R
so that you do not have to calculate the variance by hand. You can use the same technique to check your work as you calculate skew and kurtosis by hand.
To begin: Let x = 2, 4, 6, 8, 10
x <- c(2, 4, 6, 8, 10)
Once you add these values into R
’s memory, you will be able to see them in your environment tab and will be able to include them in calculations:
> mean(x)
[1] 6
Extra Information
This week, I mentioned a number of important statisticians. If you want more information, you can check out these Wikipedia pages:
- Jacob Bernoulli
- Ladislaus Bortkiewicz
- Abraham de Moivre
- Ronald Fisher
- Francis Galton
- Carl Friedrich Gauss
- Pierre-Simon Laplace
- Blaise Pascal
- Karl Pearson
- Siméon Denis Poisson
Additionally, here is the link to the quincunx simulation we talked about in class!