Appendix A.3: Statistical methods used in farm worker monitoring data
Several different inferential statistical methods were applied in the analysis of the monitoring data from the Youth in Agriculture exposure monitoring studies.  
 If the sample size in either of the age groups was not too small (i.e., less than 5 samples), and the samples were independent (i.e., they are not correlated), then a "two independent sample t test" was used to compare the exposure between child and adult workers. One of the assumptions of the t test is that samples follow a normal distribution. 

 Besides the independent sample t-test, the "Wilcoxon rank sum test" was also applied to compare exposure between child and adult workers. The Wilcoxon rank sum test is used when samples are independent, and it is a non-parametric alternative of two independent sample t test. Non-parametric tests do not assume the samples to follow a particular probability distribution. In this analysis, if the samples were independent and the sample size was small (i.e. less than five in any one of the age groups), then only a non-parametric test was used.  

 If there were repeated measurements from the same worker in same crop-pesticide scenario over multiple days, then a "mixed model" was used to account for the correlated observations. Then, a comparison of the least square means between the children and adult age groups was performed within the framework of mixed model. A description of statistical methods is provided below.
Two independent sample t test: 
A two independent sample t-test is designed to compare means of same variable between two groups. The underlying assumption of the independent sample t-test is that samples follow normal distribution and they are independent and identically distributed. Exposure usually follows a log-normal distribution. Therefore, exposure was log transformed before conducting the t-test. In the analysis of the Youth in Agriculture monitoring data, the mean of the log transformed rate between the group of children farm workers and the group of adult farm workers were compared. It tests whether the difference in the means is 0. In the analysis, the `TTEST' procedure in SAS software was used to conduct two independent sample t-test. A detail discussions on `TTEST' procedure in SAS can be found here http://www.ats.ucla.edu/stat/sas/output/ttest.htm.
`PROC TTEST' currently implements two different methods for computing the standard error of the difference of the means. The method of computing the standard error is based on the assumption regarding the variances of the two groups. If the assumption of equal variance of the two populations is not unreasonable, then pooled variance estimator, is used. When the variances are not assumed to be equal, then the Satterthwaite's method is used. The pooled estimator of variance is a weighted average of the two sample variances, with more weight given to the larger sample and is defined to be:
                      s = ((n1-1)s1+(n2-1)s2)/(n1+n2-2)
where s1 and s2 are the sample variances and n1 and n2 are the sample sizes for the two groups; this is called pooled variance. The standard error of the mean of the difference is the pooled variance adjusted by the sample sizes. It is defined to be the square root of the product of pooled variance and (1/n1+1/n2). Satterthwaite is an alternative to the pooled-variance t test and is used when the assumption that the two populations have equal variances seems unreasonable. It provides a t statistic that asymptotically (that is, as the sample sizes become large) approaches a t distribution, allowing for an approximate t test to be calculated when the population variances are not equal.
The p-value is the two-tailed probability computed using the t distribution. It is the probability of observing a t-value of equal or greater absolute value under the null hypothesis. All statistical tests are conducted at significance level of 0.05. If the p-value is less than the pre-specified alpha level, 0.05, we will conclude that the difference is statistically significant.
For details documentation regarding the t-test in SAS please see the link http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#ttest_toc.htm
Two factor analysis of variance method was also performed to see whether any other factor, such as gender, is significant besides age group in analysis of exposure rate. If it is significant then a difference of least square means between age groups (children vs. adult), adjusting for the effect of gender, was estimated from the model.  
Wilcoxon rank sum test:
Wilcoxon Rank Sum test is the non-parametric alternative of two sample t test. Non parametric tests do not assume a particular probability distribution. It is generally based on ranking or order statistics. Non parametric tests make less stringent demands on the data. They require fewer assumptions. The underlying assumption for the Wilcoxon Rank Sum test is that samples are independent of each other. The tests statistic is based on the sum of the ranks assigned to the observed data from each population when the combined sample is ranked from smallest to largest. 
JMP and SAS softwares were used to conduct Wilcoxon Rank Sum test to compare exposure rates between child and adult farm workers. In the JMP output, a sum of the rank score for each level is reported which is denoted by score sum. "Expected score" is the expected score under the null hypothesis that there is no difference in class level. "Score mean records" means rank score for each level. (Mean-Mean0)/Sd0 is the standardized score where Mean0 is the expected score under the null hypothesis and Sd0 is the standard deviation of the score sum expected under null hypothesis; null hypothesis is that the group means or medians are in the same location across groups. S indicates the sum of the rank scores. Z gives the test statistic under normal approximation. A two sided p value for the normal approximation test is also reported. When a sample size is small, a p value from exact Wilcoxon Rank Sum test was also calculated using SAS `npar1way' procedure. All statistical tests are conducted at significance level of 0.05. If the p-value is less than the pre-specified alpha level, 0.05, we will conclude that the difference is statistically significant.  Please see the link below to know more details of Wilcoxon test in JMP: http://www.jmp.com/support/help/Nonparametric.shtml
Repeated Measures Mixed Model:
In several studies the same subjects (i.e. farm workers) were in the same crop-pesticide scenario and were monitored for multiple days. The observations from the same subject for multiple days are correlated and not independent. A mixed model was used to account for the correlated observations within subject. Mixed procedure (known as `proc mixed') in SAS provides flexible environment to model repeated measures data.  Different types of covariance and correlation structure are available within `PROC MIXED' to model residual error. AIC criterion is used to select appropriate correlation structure of the error term. Measurements made on different subjects are still assumed to be independent. To fit the model in PROC MIXED, the REPEATED statement is used to specify the repeated measures factor, the subject variable identifying observations that are to be correlated, and a covariance or correlation structure. Age group (adult vs children), day and covariates are used as fixed effects in the model. Day is the repeated measures factor. Dependent variable is the log of the exposure rate. After selection of appropriate correlation structure, inferences about the fixed effects can be obtained by performing hypothesis tests and constructing relevant confidence intervals. The LSMEANS statements in PROC MIXED are tailored to this purpose; they have the advantage that all standard error estimates account for the estimated correlation structure. In our analysis LSMEANS statement was used to test the hypothesis whether there is a difference in exposure rate between adult and child age groups. Least square means is referred to as marginal means. In an analysis of covariance model, they are the group means after having controlled for a covariate. The selected output from SAS includes an estimate of the difference of least square means between adult and child age groups, p value, as well as confidence interval. For details of the `proc mixed' repeated measure see the link:  http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_mixed_sect034.htm
All statistical tests are conducted at significance level of 0.05.




