5. Introduction of Advanced Analytical & Inferential Statistics

5.1 formal definition of alpha levels and P values

In this chapter the idea we want to consider is this: let’s say we have two data sets – call one the control or historical data and call the other the experiment – and we want to test, formally, if there is a statistically significant difference between them. The inference is whatever experiment, or effect we applied, worked, and we want to conclude that it caused difference. However, since this is a maths foundations textbook & course we do not dive into the details of such matters. It is worthy to mention that one should never make a conclusion from a single experiment data set, rather they should look for trends in the data across multiple data sets.

Definition 5.1.1 – Statistically significant difference

We say that an observed difference between two data sets is statistically significant if it is unlikely to have occurred by chance alone (often referred to as unlikely to have occurred given the null hypothesis ).

Definition 5.1.2 – Critical Value Z

 We define the critical value of an experiment, conducted at the level of [latex]1-\alpha[/latex] percentage confidence, to be the solution, Z, of the equation

where the integration to compute the probability would be utilizing the probability density function associated with the data. It is worthy to note that in many textbooks or educational websites one will see this definition written in the classical  “two tailed format,” but here we have rewritten this, using the argument of symmetry, by adding half of the alpha,

[latex]1-\alpha+\frac{\alpha}{2}[/latex], from the left tail to make the equation in a cumulative format, which is useful as many coding languages utilize the cumulative distribution function. For example NORMCDF(Z) in matlab will return the probability for a standard normal up to the value of the random variable equal to Z.

Example 5.1.1

Find the critical value for a 95% confidence, assuming the density function is a standard normal

To begin we note here that alpha is 0.05. Now, the solution to ensure 95% is within would be

[latex]P(x \lt Z)=1-0.025[/latex]

Or rewriting this in terms of the density function, it becomes

[latex]\int_{-\infty}^{Z}{\frac{1}{\sqrt{2\pi}}e^{-0.5x^2}dx}=0.975[/latex]

and while this solution can not be obtained explicitly (the definite integral of [latex]e^{u^2}[/latex]

is not known in exact closed format) it is possible to create and run a numerical code which will yield the well known solution

[latex]Z _\alpha =1.96[/latex]

where the common notation, [latex]Z_\alpha[/latex], of the solution is used here and henceforth.

An important comment to note here is on the rounding, say for example the program yielded the solution to the above to be 1.951, we could not round that down to 1.95. As while the solution would tell us that 1.951 is the value of our random variable so that 95% is below, the value of 1.95 would be to the left on the axis, so there would be less area. The value of 1.95 would possible only cover 94.9%, which is not 95% as we claim the level of confidence to be. It is always essential in statistical analysis to round in a way called “statistically conservative,” this is rounding as to always ensure to we meet or exceed the level of confidence of area under the curve; namely round up critical value and round down test statistical values obtained from experimental data.

Definition 5.1.3 – P value

When given the test statistic, TS, of a statistical experiment governed by the probability density function, f(x), we define the P value as

[latex]P=\int_{TS}^{\infty}f\left(x\right)dx[/latex]

where the calculation is essential the area (or doubt) that is remaining in the rejection region above Za. Hence, when comparing two models the one with the lower P value is preferred!

Example 5.1.2

Find the P value given a test stat of 2.5, assuming this experiment was done using a standard normal density.

To begin we note here that TS = 2.5 and the density is a standard normal,

[latex]\frac{1}{\sqrt{2\pi}}e^{-0.5x^2}[/latex]

so our definition of P value

would become

[latex]P=\int_{2.5}^{\infty}{\frac{1}{\sqrt{2\pi}}e^{-0.5x^2}dx}[/latex]

which has the solution 0.0062 (or 0.6%)

Example 5.1.3

Find the P value, using the same test stat from the prior example, TS = 2.5, but assuming this experiment was done using a T density, with five degrees of freedom.

To begin we note here that alpha is TS = 2.5 and the density is a T with v=5, [latex]constant\ast\left(1+\frac{x^2}{5}\right)^{-1.5}[/latex], so our definition & computation of P value would be

[latex]P=\int_{TS}^{\infty}f\left(x\right)dx[/latex]

with the density formula noted above, and has a solution of 0.00275 (or 0.27%).

Now, it is important to note here that the P value is different for the same test stat when using a different distribution; hence, why the P value is the simplest stat to look at. Think: the lower the P value the better.

It is very important to remember, as a data analyst reporting results showing two data sets are significantly different, is not the same as nor does it prove Cause and Effect! Moreover, even if a longitudinal trend is shown within the data this does not prove a scientific theory. Data Science may be used to investigate solutions and/or governing equations, but such things – especially within fields like engineering – can only truly be developed utilizing the classic mathematical framework, e.g. modeling through things such as Newton’s Law.

Definition 5.1.4 – Confidence Interval

[latex]MEAN-Z_\alpha\left(StError\right)\ \ to \ \ MEAN+Z_\alpha\left(StError\right)[/latex]

The term of Standard Error is most commonly taken to be the standard deviation divided by the square root of the same size./calculation is essential the area (or doubt) that is remaining in the rejection region above [latex]Z_\alpha[/latex]. Hence, when comparing two models the one with the lower P value is preferred!

The big idea, or definition, to understand in the applications of the confidence interval is this: if a confidence interval is built using historical data, then an experiment is conducted and the mean of this experimental data set is obtained. Then, if the experimental mean is outside of the confidence interval, we define the difference to be statistically significantly different; hence, we can infer, but not prove, that whatever treatment was applied to the experimental data set worked.

5.2 mathematical development of confidence intervals and introduction to hypothesis testing

In the prior chapter the general theory of probability was summarized along with the main concepts of probability density functions, cumulative distributions, and moment generating functions. Now, in this chapter we will embark on the study of one of the most powerful and important real-world applications of mathematics: the theory of hypothesis testing! The general idea of hypothesis testing can be summarized as the process of obtaining some data from an experiment and then using probability theory to attempt to validate a claim. Moving forward, we will refer to the claim as the hypothesis and generally speaking the experiment will involve the implementation of something that wasn’t utilized in the past which we will refer to as the treatment. While the idea will not be discussed in detail here many instructors effectively teach hypothesis testing through the parallel logic of a court case. Namely, in a court case the defendant is assumed innocent until proven guilty beyond a reasonable amount of doubt as decided by a jury. Likewise, in hypothesis testing we desire to validate our claim the hypothesis, but we will take the stance that it is not valid (AKA assumed innocent) until proven otherwise beyond a mathematical amount of certainty (AKA beyond reasonable doubt).

In general, the hypothesis procedure will consist of 4 steps:

First, the hypothesis is made as a mathematical statement.

Second, the so called “critical value” and “rejection region” are defined.

Third, calculation of the test statistic.

Fourth, conclusions are stated.

In the following derivation, we will assume that the hypothesis is being studied on the simple difference on a population mean after the application of a treatment. Namely, we will consider the so called “null hypothesis” as µ = population mean value as given. The idea of this hypothesis statement is that the symbol µ is in a sense representing the population mean moving forward in time with the treatment applied consistently in the future ( i.e. this statement is saying that the mean does not change when the treatment is applied ). In the same manner that a defendant is assumed innocent in a court case until proven otherwise, we will assume this null hypothesis is truthful until proven otherwise.

The null hypothesis will be rejected if our soon to be defined test statistics falls outside our mathematical region which is defined from our chosen level of statistical certainty. Namely, if we define our statistical certainty to be at a level of [latex]\left(1-\alpha\right)%[/latex]then the critical value [latex]z_\alpha[/latex] (AKA endpoints) of our mathematical region can be found from the probability statement: [latex]P(-z_\alpha< X < z_\alpha)=1-\alpha[/latex]. For example, if we take a 95% confidence level the critical value will solve the equation

[latex]\int_{-z_\alpha}^{z_\alpha}{\frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx=1-\alpha=0.95}[/latex]

This example will yield the solution of [latex]z_\alpha=1.96[/latex] which is a very important, if not “famous,” critical value and very much worth remembering! It is worthy to note that many authors will use the notation

[latex]z_{\alpha/2}[/latex] due to the fact that this example is an illustration of a so called “two tailed test.” A two tailed test is one where the allowable error is allowed either above the critical value or below the negative of the critical value, hence the error is split in half. A one tailed test it where the error is not split in half, hence only outside of the critical value in “one tail.” For now, we will restrict our study to the two tailed examples for simplification.

Looking back to the original claim, we see that we have defined two regions: the range within the region is where we expect things to be and the range outside of the region, which is to be viewed as an oddity. Thus, we can define the region outside to be the region to reject the null hypothesis. Namely, if our soon to be defined test statistic is either greater than [latex]z_\alpha[/latex] (or less than –[latex]z_\alpha[/latex]) we will reject the null hypothesis. Or, in a cleaner mathematical statement we can say:

Reject the null if || test stat || >[latex]z_\alpha[/latex].

Now, the only missing point is the so-called test statistic. We will formally define and prove where this value comes from shortly but let us first accept the definition so that we can view a few examples to illustrate this process of hypothesis testing.

Definition 5.2.1 – The Test statistic for a single sample hypothesis test of differences of mean

TS = [latex]\frac{X-\mu}{\sigma/\sqrt{n}}[/latex]

Example 5.2.1

To begin solving, we recall that a full solution to a hypothesis test problem has four steps:

First, the hypothesis is made as a mathematical statement.

Second, the so called “critical value” and “rejection region” are defined.

Third, calculation of the test statistic.

Fourth, conclusions are stated.

For our present example, we will assume that the level of confidence is 95% and the test is a two tailed test ( which would make sense as the researcher wanted to unbiasedly test for an effect on speed rather than specifically test for an increase ). So, we already know the second step is with 1.96 being the critical value. Hence, all we really need to compute are the 1st and 3rd steps. To begin, we must define the desired hypothesis, which is what we really want to show and is often referred to as the alternate hypothesis. In this example, the researcher knows that the population mean is 73 and they are attempting to see if the drug has an effect on that speed. Thus, we set the alternate hypothesis to state that µ is different than 73. Then, we also must construct the null hypothesis (the logical opposite of the alternate hypothesis) which in this case will state that µ is equal to 73. Now, all that remains is to compute the test statistic from our formula and then use our results to conclude.

In doing so, we obtain the for step solution as:

First, null H: µ = 73.

Alt H: µ ≠ 73.

Second, assume the null is truthful and reject if || TS || > 1.96.

Third, TS =[latex]\frac{71-73}{\frac{21}{\sqrt{81}}}=-0.857.[/latex]

Fourth, since the TS does not fall in the rejection region, we fail to reject the null.

It is very important to note in this example that the result is just simply failure to reject the null hypothesis. This wording is very important, and it is essential to understand that this conclusion does not disprove anything, nor do we accept anything, rather we have just failed to reject the null hypothesis. Perhaps, one will find it useful to think that we have attempted to do something and failed to do so. Hence, our conclusion is that we did not do anything, or perhaps a more sophisticated way it to say we have “no conclusion!” Analogously, when a jury is tasked to find a defendant guilty beyond a reasonable doubt, if they do not find the evidence, then their formal result is to say “not guilty” or “no, we did not find sufficient evidence.”

Example 5.2.2

An instructor wants to see if group activity work increase test scores. Currently the school’s average math score is 85 with a standard deviation of 4. A sample of 36 students are assigned to do group work in class. Their average is 90.

Perform an appropriate hypothesis test.

Again, we note that a full solution to a hypothesis test problem has four steps, and to begin our present example we observe that the desired hypothesis is specifically to increase the test scores. It is known that the population mean score is 85, so the logical choice for the Alt H is : µ > 85. As in our last example, we will assume that the level of confidence is 95%, but this is a one tailed test. Therefore, the prior critical value of 1.96 would not be the correct critical value. To find the correct value we would need to go back to our probability density theory. In doing so, we obtain the desired equation to solve

[latex]\int_{-\infty}^{z_\alpha}{\frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx=1-\alpha=0.95}[/latex]

which will yield the solution of

[latex]z_\alpha=1.65[/latex]

We are now prepared to fully develop our hypothesis testing procedure:

First, null H: µ = 85.

Alt H: µ > 85.

Second, assume the null is truthful, and reject if: TS > 1.65.

Third, TS = [latex]\frac{90-85}{\frac{4}{\sqrt{36}}}=7.5.[/latex]

Fourth, since the TS does fall in the rejection region we reject the null.

We previously presented the definition of the test statistic formula without development nor proof. Let us now formally define and prove from where this formula comes. We assume that the population problem we are studying is modeled by a normal distribution with mean µ and standard deviation 𝞼, hence X~N(µ,𝞼). Now, in regards to the sample we will need to utilize two Lemmas from a theorem of advanced probability theory known as the Central Limit Theorem. Namely, we will take as definition the following:

Definition 5.2.2 – The mean and variance of a sampling distribution

If X1,X2,…, Xn are random variables with

[latex]\bar X=\frac{1}{n} \displaystyle\sum_{i=1}^nX_i[/latex]
[latex]S=\frac{1}{n-1}\displaystyle\sum_{i=1}^n(X_i-\bar X )^2[/latex]

and if is the random variable of the sample means of all the simple random sample size n from a population with expected value E(X), and variance Var(X) then

[latex]E(\bar X)=E(X)[/latex]

[latex]Var(\bar X)=\frac{1}{n}Var(X)[/latex]

We now need to prove our main foundational result, which is illustrated in the following theorem definition.

Definition 5.2.3 – Central Limit Theorem

If X1,X2,…, Xn are normally distributed random variables with mean µ and standard deviation 𝞼, then

[latex]\frac{(\bar X -μ)}{σ/\sqrt{n}}∼N(0,1)[/latex]

Proof: Let us begin by recalling a fact about random variables, namely if

[latex]X∼N\left(\mu,\sigma\right)[/latex]

and we consider the RV =aX, where a is a fixed constant, then we can show by some routine algebra on the cumulative distribution function that this

[latex]RV∼N\left(a\mu,a\sigma\right).[/latex]

A similar result is well known that if we have two random variables

[latex]X∼N\left(\mu,\sigma\right)[/latex]

and

[latex]Y∼N\left(\nu,\varphi\right)[/latex]

and if we consider the RV = X + Y, then we will find

[latex]X+Y∼N\left(\mu+\nu,\sigma+\varphi\right).[/latex]

Thus, we can draw the conclusion that our X1,X2,…, Xn are normally distributed random variables that

[latex]X∼N\left(\mu,\frac{\sigma}{\sqrt{n}}\right).[/latex]

Now, we shall look at the expression

[latex]\frac{\bar X -μ}{σ/\sqrt{n}}=\frac{\sqrt{n}}{σ}\bar X -\frac{\sqrt{n}}{σ}μ.[/latex]

From the results above we will see that this has a mean 0 and standard deviation 1, hence we have proved that

[latex]\frac{\bar X -μ}{σ/\sqrt{n}}∼N(0,1)[/latex]

which means the use of the standard normal distribution for our critical values are validated.

5.3 moment generating function

The Moment Generating Function, as its name implies, is a function that we will create which can be used to find the moments of the probability distribution, and these moments are very useful in applications to find things such as the mean and variance. The good news is once we have obtained the MGF, we will be able to obtain these moments without computing lengthy integrations such as we encountered in prior sections when attempting to directly compute the variance. However, as we will see shortly the actual definition of the MGF is actually itself an integration, but in most applications one will be able to start from a known MGF rather than needing to compute, phew!

The formal definition of the Moment Generating Function, is the expected value of the function [latex]e^{tx}[/latex], where x is the random variable and t is a new variable which is not related to x, hence it can be treated as a constant in operations such as the expected value integration which is with respect to x. Thus, we have arrived at our main definition for this section:

Definition 5.3.1 – The moment generating function for density f(x)

[latex]MGF=E(e^{tx})=∫_Ωe^{tx}∙f(x)dx[/latex]

where the result is a function of t.

Example 5.3.1

Find the moment generating function for the uniform density with L =1 and R = 5, i.e. the density

[latex]f\left(x\right)=\frac{1}{4}.[/latex]

To begin our solution we note that we have the density defined as [latex]f\left(x\right)=\frac{1}{4},[/latex] and from our prior knowledge we recall that this function is defined on the sample space of 1<x<5. Hence, we can now compute its MGF as

[latex]MGF=E\left(e^{tx}\right)=∫_Ωe^{tx}∙f(x)dx=∫_1^5e^{tx}∙\frac{1}{4}dx.[/latex]

Now, to compute this integration it is first noted that the value of t, while it is officially a variable, can be treated as a constant within this integration, hence we can compute the integration as

[latex]MGF=\frac{1}{4}\left[e^{tx}\right]_{x=1}^5=\frac{1}{4}\left(e^{5t}-e^t\right)[/latex]

Example 5.3.2

Find the moment generating function for the exponential density with A=2, i.e. the density

[latex]f\left(x\right)=2e^{-2x}.[/latex]

To begin our solution we note that we have the density defined as [latex]f\left(x\right)=2e^{-2x},[/latex] and from our prior knowledge we recall that this function is defined on the sample space of x >0. Hence, we can now compute its MGF as

[latex]MGF=E\left(e^{tx}\right)=∫_Ωe^{tx}∙f(x)dx=∫_0^∞e^{tx}∙2e^{-2x}dx.[/latex]

Now, to compute this integration, again it is important to recall that while the value of t is officially a variable, here it can be treated as a constant within the integration. Also, the useful property of exponential functions, [latex]e^A\bullet e^B=e^{A+B}[/latex], is applied and doing so we find

[latex]MGF=2\int_{0}^{\infty}{e^{tx-2x}dx=2\int_{0}^{\infty}{e^{x(t-2)}dx}}[/latex]

[latex]=\frac{2}{t+2}\left[e^{x\left(t-2\right)}\right]_{x=0}^\infty=\frac{2}{t-2}\left(e^{-\infty}-e^0\right)=\frac{2}{2-t}[/latex]

It was assumed here that the value of t was chosen so that t-2<0, hence the value of the parameter t was chosen so that our integration would converge.

In a pure mathematical point of view one may state the conclusion from the last example that we have found the moment generating function to be [latex]\phi\left(t\right)=\frac{2}{2-t}[/latex] which is only defined for t < 2. However, for our purposes in this textbook we will not include such details as we will only be working with well-known Moment Generating Functions which are stable and defined at the value of t as needed in applications (generally it is needed to evaluate these function at t = 0), but the abstract concept of convergence it important to be aware of as not all density functions have a convergent MGF, such as the next example will illustrate.

Example 5.3.3

Find the moment generating function for T density, with v=2, i.e. the density

[latex]f\left(x\right)=\frac{\mathrm{\Gamma}\left(\frac{3}{2}\right)}{\sqrt{2\pi}\mathrm{\Gamma}\left(2\right)}\left(1+\frac{x^2}{2}\right)^{-\frac{3}{2}}[/latex]

To begin our solution we note that we have the density defined as [latex]f\left(x\right)=\frac{\mathrm{\Gamma}\left(\frac{3}{2}\right)}{\sqrt{2\pi}\mathrm{\Gamma}\left(1\right)}\left(1+\frac{x^2}{2}\right)^{-\frac{3}{2}}[/latex] and from our prior knowledge we recall that this function is defined on the sample space of x >0. Hence, we can now compute its MGF as

[latex]MGF=E\left(e^{tx}\right)=∫_Ωe^{tx}∙f(x)dx=∫_0^∞e^{tx}∙\frac{Γ(\frac{3}{2})}{\sqrt{2π}Γ(1)}(1+\frac{x^2}{2})^{-\frac{3}{2}}dx.[/latex]

Prior to computing this integration it is worthy to recall the known particular values of the gamma, namely that [latex]\mathrm{\Gamma}\left(\frac{3}{2}\right)=\frac{1}{2}\sqrt\pi[/latex] and [latex]\mathrm{\Gamma}\left(1\right)=1,[/latex] hence our integration simplifies to

[latex]=\frac{1}{2\sqrt{2}}\int_{0}^\infty e^{tx}\left(1+\frac{x^2}{2}\right)^{-\frac{3}{2}}dx[/latex]

Now, while no exact “closed form” of this integral (e.g. indefinite integral) is known, it is possible to compute the integration using series methods; however, doing so would result in a solution that involves all positive power terms of the form [latex]\left(tx\right)^n.[/latex] This would then need to be evaluated at positive infinity, which would lead to a divergent result. Hence, the conclusion we obtain is that this MGF integration for the T density does not converge, so we conclude that the T density does not have an MGF.

Prior to continuing our development of Moment Generating Functions, along with their applications, it is worthy to revisit our list of common density function that are frequently used in examples, and now also state their Moment Generating Functions. The following are the most likely examples that you will encounter are:

The exponential density is [latex]f\left(x\right)=Ae^{-Ax}[/latex] which is defined for x 0, and its MGF is [latex]\phi\left(t\right)=\frac{A}{A-t}[/latex]

The uniform density is [latex]f(x)=\frac{1}{(R-L)}[/latex] which is defined for L < x < R, and its MGF is [latex]\phi(t)=\frac{1}{t(R-L)}(e^{Rt}-e^{Lt})[/latex]

The normal density is [latex]f(x)=\frac{1}{\sigma\sqrt{2π}}e^{\frac{-(x-μ)^2}{2σ^2}}[/latex] which is defined for all x, and its MGF is [latex]\phi\left(t\right)=e^{\mu t+\frac{1}{2}\sigma^2t^2}[/latex]

The T density is [latex]f(x)=\frac{\Gamma\left(\frac{v+1}{2}\right)}{\sqrt{v\pi}\ \Gamma(\frac{v}{2})}\left(1+\frac{x^2}{v}\right)^{-\frac{v+1}{2}}[/latex] which is defined for x > 0 with v being the degrees of freedom, and as noted in the last example from the prior section it does not have a MGF.

The chi squared density is [latex]f\left(x\right)=\frac{1}{2^\frac{v}{2}\mathrm{\Gamma}\left(\frac{v}{2}\right)}x^{\left(\frac{k}{2}\right)-1}e^{-\frac{x}{2}}[/latex] which is defined for x > 0, with v being the degrees of freedom, and and its MGF is [latex]\phi\left(t\right)=\left(1-2t\right)^{-v/2}[/latex]

As the proof of the prior section developed, we can now alternately find the nth moment,

[latex]\mu_n=E\left(x^n\right),[/latex]

of a probability distribution without needing to actually computing the expectation integration.

Definition 5.3.1 – The n’th moment of a density f(x)

[latex]\mu_n=E(x^n)=\frac{d^n}{dt^n}\left[MGF\right]_{t=0}.[/latex]

This result can be extremely useful in applications, such as finding the variance which one can show is equal to [latex]µ_2-(µ_1)^2.[/latex]

Prior to moving forward with examples, let us quickly outline the proof of the above result. To being, the nth moment is formally defined as

[latex]E(x^n)[/latex]

Now, the moment generating function is defined as

[latex]MGF=E\left(e^{tx}\right)[/latex].

And, expanding the exponential function from this definition in a power series, we obtain that the moment generating function can alternately be written as

[latex]MGF=E\left(\sum_{n=0}^{\infty}\frac{\left(tx\right)^n}{n!}\right)=1+t\bullet E\left(x\right)+\frac{t^2}{2}E\left(x^2\right)+...[/latex]

Then, if one takes the first derivative and evaluates that at t being zero, all terms except the expected value of x will vanish hence, we have established that

[latex]\frac{d}{dt}\left[MGF\right]_{t=0}=E\left(x\right)[/latex]

Likewise, if one takes two derivative and evaluates that at t being zero, all terms except the expected value of x squared will vanish hence, we have established that

[latex]\frac{d^2}{dt^2}\left[MGF\right]_{t=0}=E\left(x^2\right)[/latex]

And, this pattern can be continued on indefinitely to prove the result provided in definition 5.3.1. Moreover, if one plays a little algebra – recalling that the mean “μ” is a constant so it can be taken outside of the integration – with the prior definition of variance

[latex]VAR=E\left(x-\mu\right)^2=\int_{\mathrm{\Omega}}{\left(x-\mu\right)^2dx.}[/latex]

It can be established that

[latex]VAR=\int_{\Omega}{x^2dx}-\mu^2[/latex]

or in terms of moments

[latex]VAR=\mu_2-\left(\mu_1\right)^2[/latex].

 

Then, recalling that μ1 can be computed as the first derivative of the MGF, while μ2 can be computed as the second derivative of the MGF, we can see it is possible to obtain both the mean & variance through this new method as

[latex]MEAN=\mu_1\ \ \ \ \ \ VAR=\mu_2-\left(\mu_1\right)^2[/latex]

The method developed above can be used to compute the expectation and variance without computing any of the integrals, such as done in the prior examples. Let us now look to some examples for illustration.

Example 5.3.4

For the exponential density is [latex]f\left(x\right)=2e^{-2x}[/latex] which is defined on the sample space for x>0, and has the MGF is [latex]\phi\left(t\right)=\frac{2}{2-t}[/latex] find the mean and variance, firstly by the classical “integration” method, then secondly by the MGF method.

To begin our solution we note that we have the density defined as [latex]f\left(x\right)=2e^{-2x}[/latex] and from our prior knowledge we recall that this function is defined on the sample space of x >0. Hence, we can now compute its expectation as

[latex]E\left(x\right)=∫_Ωx∙f(x)dx=∫_0^∞xe^{-2x}dx[/latex]

The solution to this integration is found to be = 0.5. Now, on the other hand we can compute the 1st moment as

[latex]\frac{d}{dt}[MGF]_{t=0}=\frac{d}{dt}\left[\frac{2}{2-t}\right]_{t=0}=\frac{1}{2}[/latex]

Likewise, we can now compute the variance as expectation

[latex]VAR=E(x-µ)^2=∫_Ω(x-µ)^2∙f(x)dx=∫_0^∞\left(x-\frac{1}{2}\right)^2e^{-2x}dx[/latex]

The solution to this integration is found to be = 0.25. Now, on the other hand we can compute the variance as

[latex]\frac{d^2}{dt^2}[MGF]_{t=0}-µ^2=\frac{d^2}{dt^2}\left[\frac{2}{2-t}\right]_{t=0}-\left(\frac{1}{2}\right)^2=\frac{1}{2}-\frac{1}{4}=\frac{1}{4}.[/latex]

Example 5.3.5

Find the mean of for chi squared density, with v=2 degrees of freedom, firstly by the classical “integration” method, then secondly by the MGF method.

i.e. the density

[latex]f\left(x\right)=\frac{1}{2\mathrm{\Gamma}\left(1\right)}x^{\left(1\right)-1}e^{-\frac{x}{2}}=\frac{1}{2}e^{-\frac{x}{2}}[/latex]

with the associated MGF is [latex]\phi\left(t\right)=\left(1-2t\right)^{-1}[/latex]

To begin our solution we note that we have the density defined as [latex]f\left(x\right)=\frac{1}{2}x^{-\frac{x}{2}}[/latex]

and from our prior knowledge we recall that this function is defined on the sample space of x >0. Hence, we can now compute its expectation as

[latex]E\left(x\right)=∫_Ωx∙f(x)dx=∫_0^∞xe^{-\frac{x}{2}}dx.[/latex]

The solution to this integration is found to be =2. Now, on the other hand we can compute the 1st moment as

[latex]\frac{d}{dt}[MGF]_{t=0}=\frac{d}{dt}[(1-2t)^{-1}]_{t=0}=2[/latex]

Example 5.3.6

A probability density function is under investigation, but it is not known explicitly, and using a set of 100 data points it is found to have the MGF [latex]e^{t+4t^2}.[/latex]

Use this function to find the mean and variance, and then use those results to find the 95% two tailed confidence interval. Also, make a comment as to if it is possible to those results and/or the confidence interval results to find the actual probability density function.

Now, to being we can compute the 1st moment as

[latex]\frac{d}{dt}[MGF]_{t=0}=\frac{d}{dt}\left[e^{t+4t^2}\right]_{t=0}=1[/latex]

Then, we can compute the variance as

[latex]\frac{d^2}{dt^2}[MGF]_{t=0}-µ^2=\frac{d^2}{dt^2}\left[e^{t+4t^2}\right]_{t=0}-(1)^2=3-1=2[/latex]

And, we can then set up the 95% confidence interval

[latex]MEAN-1.96\bullet\sqrt{\frac{VAR}{n}}[/latex] to [latex]MEAN+1.96\bullet\sqrt{\frac{VAR}{n}}[/latex],

by simply plugging in the value of the mean as 1 and the value of the variance “VAR” as 2, and n as 100; doing so yields the solution that we are “95% confident that the population mean is between 0.73 and 1.27.” Lastly, to address the question as to if any of this information could be used to determine the actual probability density function, the answer to that question is a bit of a grey area; moreover, if we knew this was from a normal data set and the results we obtained were accurate representations of the population mean μ, and variance σ2, then we could define the density as [latex]{\frac{1}{\sigma\sqrt{2\pi}}e}^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}={\frac{1}{\sqrt{2\pi}}e}^{-\frac{1}{2}\left(x-2\right)^2}.[/latex] However, it is not clear from the information given – albeit that the given MGF does look familiar – that it would be safe to conclude it is a normal distribution. Moreover, it is important to remember that in practice one should never attempt to draw conclusions from a single data set! If we had multiple data sets and we saw a trend in the results and we had some real world context to suggest the phenomena was modeled by a normal distribution, then we could move closer to conclusion but it does not appear that can be done with the information given. It is important to remember that many results, such as the famous central limit theorem, are only valid for a sample of samples!

Chapter 5 Exercises
  1. From the central limit theorem the term [latex]\frac{\sigma}{\sqrt{n}}[/latex] is often called the standard error term. What happens to this term as the sample size grows without bounds? (HINT: perhaps fix [latex]\sigma[/latex], say standard normal with [latex]\sigma=1[/latex], and then try a few big then bigger samples n=10, then n=100, then [latex]n=1000\ldots[/latex])
  2. For a sample of n=31, governed by the normal density, set up (not compute) the integral equation to find the
    Z critical value for the following two tailed confidence levels using mean=10& variance.
    • [latex]\frac{\alpha}{2}=0.1[/latex]
    • 95% confidence
    • 99% confidence
  3. An experimental study is done to improve the cost to drive an electric car; it is found the average gas price is $2.69 which leads to a ten cent fuel cost per mile in cars with a standard deviation of one cent, use this information as the population. Now, in your experiment of 31 cars you worked with Elon Musk and created both a new battery & interstate charging system. This led to the average cost going down to nine cents per mile. At the 95% confidence level do you feel confident to say your experiment had an effect and/or is a statistically significant difference? NOTE: use [latex]Z_\frac{\alpha}{2}=1.96[/latex]
  4. Compute E(x) for N~(3,1) {e.g. a normal with mean [latex]\mu=3[/latex] and st dev [latex]\sigma=1[/latex]}
    • by integral definition (set up & simplify integral and use integration software)
    • by MGF
  5. Compute P(0<x<1) for T2 (set up & simplify integral and use integration software)
  6. Compute E(x) for χ2
    • by integral definition (do by hand)
    • by MGF
  7. Use the exponential PDF with a=7, i.e. [latex]f\left(x\right)=7e^{-7x}[/latex], for x>0
    • Use the traditional definition [latex]E\left(x\right)=\int_{\mathrm{\Omega}}{x\bullet f\left(x\right)dx}[/latex] to compute the expectation. (do by hand (IBP))
    • Use the traditional definition [latex]\sigma^2=\int_{\mathrm{\Omega}}{\left(x-\mu\right)^2\bullet f\left(x\right)dx}[/latex] to compute variance. (do by hand (IBP))
    • Use table provided in class to write MGF for this PDF & use it to compute the 1st & 2nd moments.
    • Compute the variance as [latex]\sigma^2=\mu_2-\left(\mu\right)^2[/latex] and verify that it yields the same solution as part b.
    • Use the traditional definition [latex]E\left(x^2\right)=\int_{\mathrm{\Omega}}{x^2\bullet f\left(x\right)dx}[/latex] to verify your part c solution for [latex]\mu_2[/latex]
  8. Use the uniform PDF with L=3, i.e. [latex]f\left(x\right)=\frac{1}{3}[/latex], for 0<x<3
    • Use the traditional definition [latex]\sigma^2=\int_{\mathrm{\Omega}}{\left(x-\mu\right)^2\bullet f\left(x\right)dx}[/latex] to compute variance (do by hand)
    • Use table provided to write MGF for this PDF & use it to compute the 1st & 2nd moments. Hint: you will need to expand [latex]e^{3t}[/latex] using a Taylor series. Hint: Taylor series for
      [latex]e^x=\sum_{n=0}^{\infty}\left(\frac{x^n}{n!}\right)=1+x+\frac{1}{2!}x^2+\frac{1}{3!}x^3+\ldots[/latex]
    • Compute the variance as [latex]\sigma^2=\mu_2-\left(\mu\right)^2[/latex] and verify that it yields the same solution as part a.

License

Icon for the Creative Commons Attribution 4.0 International License

A Self-Contained Course in Mathematical Theory of Probability Copyright © 2024 by Tim Smith and Shannon Levesque is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.