How Do You Know if a T Test Is Significant

Previously we accept considered how to test the null hypothesis that at that place is no difference between the mean of a sample and the population hateful, and no difference between the means of two samples. We obtained the difference between the means by subtraction, and and then divided this divergence by the standard error of the difference. If the difference is 196 times its standard error, or more than, information technology is probable to occur by chance with a frequency of only 1 in twenty, or less.

With small samples, where more chance variation must be allowed for, these ratios are non entirely accurate considering the uncertainty in estimating the standard error has been ignored. Some modification of the procedure of dividing the difference by its standard error is needed, and the technique to use is the t test. Its foundations were laid past WS Gosset, writing under the pseudonym "Student" so that it is sometimes known as Student's t test. The procedure does not differ greatly from the 1 used for large samples, but is preferable when the number of observations is less than lx, and certainly when they amount to thirty or less.

The application of the t distribution to the following four types of trouble volition at present be considered.

The calculation of a conviction interval for a sample mean.
The mean and standard deviation of a sample are calculated and a value is postulated for the hateful of the population. How significantly does the sample mean differ from the postulated population mean?
The means and standard deviations of ii samples are calculated. Could both samples have been taken from the same population?
Paired observations are made on two samples (or in succession on one sample). What is the significance of the difference between the ways of the two sets of observations?

In each case the problem is essentially the same – namely, to constitute multiples of standard errors to which probabilities can be attached. These multiples are the number of times a deviation can be divided by its standard error. We have seen that with big samples one.96 times the standard mistake has a probability of 5% or less, and 2.576 times the standard error a probability of i% or less (Appendix table A ). With minor samples these multiples are larger, and the smaller the sample the larger they become.

Confidence interval for the hateful from a small sample

A rare congenital disease, Everley's syndrome, generally causes a reduction in concentration of claret sodium. This is thought to provide a useful diagnostic sign likewise as a clue to the efficacy of treatment. Fiddling is known about the subject area, but the director of a dermatological department in a London teaching hospital is known to be interested in the affliction and has seen more cases than anyone else. Even and so, he has seen only 18. The patients were all aged between 20 and 44.

The mean blood sodium concentration of these 18 cases was 115 mmol/l, with standard deviation of 12 mmol/fifty. Assuming that blood sodium concentration is Normally distributed what is the 95% confidence interval within which the mean of the total population of such cases may exist expected to lie?

The data are set out every bit follows:

To observe the 95% conviction interval above and beneath the mean we at present have to detect a multiple of the standard mistake. In big samples we accept seen that the multiple is 1.96 (Chapter four). For small-scale samples we use the table of t given in Appendix Tabular array B.pdf. Equally the sample becomes smaller t becomes larger for any detail level of probability. Conversely, as the sample becomes larger t becomes smaller and approaches the values given in table A, reaching them for infinitely big samples.

Since the size of the sample influences the value of t, the size of the sample is taken into business relationship in relating the value of t to probabilities in the table. Some useful parts of the full t table appear in . The left mitt column is headed d.f. for "degrees of freedom". The use of these was noted in the calculation of the standard divergence (Chapter two). In do the degrees of freedom amount in these circumstances to 1 less than the number of observations in the sample. With these data we have 18 – 1 = 17 d.f. This is because only 17 observations plus the total number of observations are needed to specify the sample, the 18th being adamant by subtraction.

To find the number by which we must multiply the standard error to requite the 95% confidence interval we enter table B at 17 in the left paw column and read beyond to the column headed 0.05 to detect the number two.110. The 95% confidence intervals of the mean are now gear up every bit follows:

Mean + two.110 SE to Mean – ii.110 SE

which gives usa:

115 – (2.110 10 283) to 115 + two.110 x 2.83 or 109.03 to 120.97 mmol/l.

Nosotros may and then say, with a 95% run a risk of being right, that the range 109.03 to 120.97 mmol/l includes the population mean.

Besides from Appendix Table B.pdf the 99% conviction interval of the mean is as follows:

Mean + 2.898 SE to Mean – 2.898 SE

which gives:

115 – (2.898 10 2.83) to 115 + (2.898 ten 2.83) or 106.80 to 123.20 mmol/l.

Departure of sample hateful from population mean (one sample t exam)

Estimations of plasma calcium concentration in the xviii patients with Everley'south syndrome gave a mean of iii.2 mmol/l, with standard deviation 1.i. Previous experience from a number of investigations and published reports had shown that the mean was unremarkably close to 2.v mmol/l in healthy people aged 20-44, the age range of the patients. Is the hateful in these patients abnormally high?

Nosotros ready the figures out as follows:

t divergence betwixt ways divided by standard error of sample mean. Ignoring the sign of the t value, and entering table B at 17 degrees of freedom, we find that 2.69 comes between probability values of 0.02 and 0.01, in other words betwixt 2% and 1% and so It is therefore unlikely that the sample with hateful iii.ii came from the population with mean 2.5, and we may conclude that the sample hateful is, at least statistically, unusually high. Whether information technology should be regarded clinically as abnormally high is something that needs to be considered separately by the physician in charge of that example.

Difference betwixt means of 2 samples

Hither we employ a modified process for finding the standard mistake of the difference between two means and testing the size of the difference past this standard error (encounter Chapter five for large samples). For large samples we used the standard divergence of each sample, computed separately, to calculate the standard error of the difference betwixt the means. For pocket-size samples we summate a combined standard deviation for the two samples.

The assumptions are:

that the data are quantitative and plausibly Normal
that the two samples come from distributions that may differ in their mean value, but not in the standard deviation
that the observations are independent of each other.
The 3rd assumption is the most important. In general, repeated measurements on the same individual are non independent. If we had 20 leg ulcers on fifteen patients, and so we take only 15 independent observations.

The following example illustrates the procedure.

The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several dissimilar bran preparations are available, and a clinician wants to test the efficacy of 2 of them on patients, since favourable claims take been fabricated for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. Does information technology differ in the ii groups of patients taking these 2 preparations?

The null hypothesis is that the two groups come from the same population. By random allocation the clinician selects two groups of patients aged 40-64 with diverticulosis of comparable severity. Sample 1 contains 15 patients who are given treatment A, and sample 2 contains 12 patients who are given treatment B. The transit times of food through the gut are measured by a standard technique with marked pellets and the results are recorded, in social club of increasing fourth dimension, in Table seven.i .

Table seven.1

These data are shown in effigy 7.ane . The assumption of approximate Normality and equality of variance are satisfied. The design suggests that the observations are indeed independent. Since information technology is possible for the divergence in mean transit times for A-B to exist positive or negative, we will employ a 2 sided test.

Figure 7.one

With treatment A the mean transit fourth dimension was 68.twoscore h and with treatment B 83.42 h. What is the significance of the departure, 15.02h?

The procedure is as follows:

Obtain the standard deviation in sample 1:

Obtain the standard deviation in sample 2:

Multiply the square of the standard deviation of sample 1 by the degrees of freedom, which is the number of subjects minus one:

Repeat for sample 2

Add the two together and split by the total degrees of freedom

The standard error of the difference betwixt the means is

which can be written

When the difference betwixt the means is divided by this standard error the effect is t. Thus,

The table of the tdistribution Table B (appendix) which gives two sided P values is entered at degrees of freedom.

For the transit times of table seven.1,

shows that at 25 degrees of freedom (that is (fifteen – 1) + (12 – 1)), t= ii.282 lies betwixt 2.060 and 2.485. Consequently, this degree of probability is smaller than the conventional level of v%. The nil hypothesis that there is no difference betwixt the means is therefore somewhat unlikely.

A 95% confidence interval is given by

This becomes

83.42 – 68.40 2.06 10 6.582

15.02 – 13.56 to fifteen.02 + thirteen.56 or one.46 to xviii.58 h.

Unequal standard deviations

If the standard deviations in the two groups are markedly different, for example if the ratio of the larger to the smaller is greater than two, then one of the assumptions of the ttest (that the two samples come up from populations with the same standard deviation) is unlikely to concur. An guess test, due to Sattherwaite, and described by Armitage and Berry, (1)which allows for diff standard deviations, is as follows.

Rather than use the pooled estimate of variance, compute

This is coordinating to calculating the standard error of the difference in 2 proportions under the alternative hypothesis as described in Affiliate 6

Nosotros at present compute

We and then test this using a t statistic, in which the degrees of freedom are:

Although this may look very complicated, it tin be evaluated very easily on a calculator without having to write downwards intermediate steps (see below). It can produce a degree of freedom which is not an integer, and so non available in the tables. In this instance one should round to the nearest integer. Many statistical packages now carry out this test as the default, and to get the equal variances I statistic one has to specifically ask for information technology. The unequal variance t test tends to exist less powerful than the usual t test if the variances are in fact the aforementioned, since it uses fewer assumptions. Withal, it should not be used indiscriminantly because, if the standard deviations are different, how can nosotros translate a nonsignificant divergence in means, for example? Ofttimes a improve strategy is to try a data transformation, such every bit taking logarithms equally described in Chapter 2. Transformations that render distributions closer to Normality often also make the standard deviations like. If a log transformation is successful use the usual t test on the logged data. Applying this method to the information of Tabular array vii.1 , the reckoner method (using a Casio fx-350) for computing the standard fault is:

Difference betwixt means of paired samples (paired t test).

When the effects of two alternative treatments or experiments are compared, for case in cross over trials, randomised trials in which randomisation is between matched pairs, or matched case control studies (encounter Chapter thirteen ), information technology is sometimes possible to make comparisons in pairs. Matching controls for the matched variables, and so can lead to a more than powerful written report.

The test is derived from the unmarried sample t examination, using the following assumptions.

The information are quantitative
The distribution of the differences (not the original information), is plausibly Normal.
The differences are independent of each other.

The first case to consider is when each member of the sample acts as his ain control. Whether treatment A or handling B is given commencement or 2nd to each member of the sample should be determined past the use of the table of random numbers Table F (Appendix). In this way any result of one treatment on the other, even indirectly through the patient's attitude to treatment, for instance, can be minimised. Occasionally it is possible to give both treatments simultaneously, as in the treatment of a skin illness past applying a remedy to the skin on opposite sides of the trunk.

Let united states of america use every bit an example the studies of bran in the handling of diverticulosis discussed earlier. The clinician wonders whether transit time would be shorter if bran is given in the same dosage in three meals during the 24-hour interval (treatment A) or in one repast (treatment B). A random sample of patients with illness of comparable severity and aged 20-44 is chosen and the two treatments administered on two successive occasions, the club of the treatments too being determined from the table of random numbers. The alimentary transit times and the differences for each pair of treatments are set out in Tabular array 7.two

Table seven.2

In calculating t on the paired observations we work with the difference, d, between the members of each pair. Our outset job is to find the mean of the differences between the observations then the standard error of the mean, proceeding as follows:

Entering Appendix Tabular array B.pdf at 11 degrees of freedom (n – ane) and ignoring the minus sign, nosotros observe that this value lies between 0.697 and ane.796. Reading off the probability value, we see that 0.ane

A 95% confidence interval for the mean difference is given by

In this case t 11 at P = 0.05 is two.201 (table B) and so the 95% confidence interval is:

-six.5 – 2.201 x 4.37 to -half-dozen.v + two.201 x 4.37 h. or -16.ane to 3.1h.

This is quite wide, and then we cannot actually conclude that the two preparations are equivalent, and should look to a larger study.

The 2d case of a paired comparison to consider is when ii samples are chosen and each member of sample 1 is paired with one fellow member of sample 2, as in a matched instance command report. As the aim is to test the deviation, if any, betwixt two types of treatment, the option of members for each pair is designed to make them as akin as possible. The more than alike they are, the more apparent will be whatsoever differences due to treatment, considering they will not be confused with differences in the results acquired past disparities between members of the pair. The likeness within the pairs applies to attributes relating to the study in question. For instance, in a examination for a drug reducing blood force per unit area the colour of the patients' eyes would probably be irrelevant, only their resting diastolic blood pressure could well provide a footing for selecting the pairs. Another (perhaps related) basis is the prognosis for the disease in patients: in general, patients with a similar prognosis are best paired. Whatever criteria are chosen, it is essential that the pairs are synthetic before the treatment is given, for the pairing must be uninfluenced by noesis of the effects of treatment.

Further methods

Suppose we had a clinical trial with more 2 treatments. It is not valid to compare each handling with each other handling using t tests because the overall type I error charge per unit will exist bigger than the conventional level gear up for each individual test. A method of controlling for this to use a one style analysis of variance .(ii)

Common questions

Should I test my data for Normality before using the t test?

Information technology would seem logical that, because the t test assumes Normality, i should test for Normality first. The problem is that the test for Normality is dependent on the sample size. With a small sample a non-pregnant result does not mean that the data come from a Normal distribution. On the other hand, with a large sample, a significant result does not mean that nosotros could not employ the t test, because the t test is robust to moderate departures from Normality – that is, the P value obtained can be validly interpreted. At that place is something illogical about using ane significance exam conditional on the results of some other significance test. In general information technology is a thing of knowing and looking at the information. One can "eyeball" the data and if the distributions are non extremely skewed, and particularly if (for the 2 sample t test) the numbers of observations are similar in the two groups, then the t test will be valid. The main problem is often that outliers will inflate the standard deviations and render the test less sensitive. Also, it is not generally appreciated that if the data originate from a randomised controlled trial, then the process of randomisation will ensure the validity of the I examination, irrespective of the original distribution of the data.

Should I test for equality of the standard deviations earlier using the usual t examination?

The same argument prevails here as for the previous question virtually Normality. The test for equality of variances is dependent on the sample size. A rule of thumb is that if the ratio of the larger to smaller standard difference is greater than two, and then the unequal variance test should be used. With a computer one can easily practise both the equal and unequal variance t examination and come across if the answers differ.

Why should I use a paired test if my information are paired? What happens if I don't?

Pairing provides data near an experiment, and the more data that can exist provided in the analysis the more sensitive the test. 1 of the major sources of variability is between subjects variability. By repeating measures within subjects, each subject acts as its own control, and the betwixt subjects variability is removed. In full general this means that if there is a true deviation between the pairs the paired test is more probable to option it up: it is more powerful. When the pairs are generated by matching the matching criteria may not be important. In this case, the paired and unpaired tests should give like results.

References

Armitage P, Berry G. Statistical Methods in Medical Research. tertiary ed. Oxford: Blackwell Scientific Publications, 1994:112-thirteen.
Armitage P, Drupe G. Statistical Methods in Medical Research. third ed. Oxford: Blackwell Scientific Publications, 1994:207-14.

Exercises

7.1 In 22 patients with an unusual liver disease the plasma alkali metal phosphatase was establish by a certain laboratory to have a mean value of 39 Male monarch-Armstrong units, standard deviation three.4 units. What is the 95% confidence interval within which the mean of the population of such cases whose specimens come up to the aforementioned laboratory may be expected to prevarication?

vii.two In the 18 patients with Everley'southward syndrome the hateful level of plasma phosphate was one.7 mmol/50, standard deviation 0.eight. If the mean level in the general population is taken as ane.2 mmol/l, what is the significance of the divergence betwixt that mean and the mean of these eighteen patients?

seven.3 In two wards for elderly women in a geriatric hospital the post-obit levels of haemoglobin were establish:

Ward A: 12.two, 11.1, 14.0, 11.three, 10.8, 12.five, 12.two, 11.9, thirteen.6, 12.7, 13.4, xiii.7 k/dl;

Ward B: 11.nine, ten.7, 12.3, 13.ix, 11.one, eleven.two, 13.3, 11.four, 12.0, 11.ane k/dl.

What is the difference between the mean levels in the 2 wards, and what is its significance? What is the 95% conviction interval for the difference in treatments?

7.4 A new treatment for varicose ulcer is compared with a standard treatment on x matched pairs of patients, where handling between pairs is decided using random numbers. The outcome is the number of days from commencement of treatment to healing of ulcer. One physician is responsible for treatment and a second medico assesses healing without knowing which treatment each patient had. The following treatment times were recorded.

Standard handling: 35, 104, 27, 53, 72, 64, 97, 121, 86, 41 days;

New treatment: 27, 52, 46, 33, 37, 82, 51, 92, 68, 62 days.

What are the mean departure in the healing time, the value of t, the number of degrees of freedom, and the probability? What is the 95% confidence interval for the divergence?

forestandindeford.blogspot.com

Source: https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/7-t-tests

How Do You Know if a T Test Is Significant

Confidence interval for the hateful from a small sample

Difference betwixt means of 2 samples

Unequal standard deviations

Difference betwixt means of paired samples (paired t test).

Further methods

Common questions

References

Exercises

0 Response to "How Do You Know if a T Test Is Significant"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel