Why were normal distributions and Weibull distributions chosen as distributions of random effects?

 GLMM are used to analyze non-normal data distributions.  Since the data are not normally distribution, the appropriate link function is used for the data.  Current SAS statistical software assumes the random effects (not raw data) in GLMM model are normally distributed.  Therefore, it is reasonable to generate the random effects (animal or day) in terms of logit values from normal distributions.
 We also did the sensitivity analysis by generating the random effects from Weibull distributions. However, as indicated above, the currently available GLMM model in SAS statistical software assumes the random effects to be normally distributed.  If the random effects in terms of logit values were generated from a distribution with a shape so different from the shape of a normal distribution, the results of simulations are not reliable (i.e. the estimates are not close to the true parameters used to create the data) and non-convergence is more likely to happen for the dataset.

Purpose of the appendix II
 Since the random effects current available from GLMM in SAS statistical software are assumed to be normally distributed, we want to see whether the estimates of the datasets are close to the true value since the random effects were randomly generated from Weibull distributions, not from normal distributions as assumed in GLMM.  
 Based on the results of the simulations, the method we used to generated data and the parameters of the Weibull distributions (SubVar = 1 and DayVar = 0.5) in the simulations are reasonable since the estimates from the datasets using the GLMM are "best estimates" of the true parameters of the distributions: the medians of the estimates were close to the true values that we set in the simulations.
 Therefore, for the simulations of power/sample size of pet product studies, it is reasonable to generate the random effects (animal and day) from normal distributions or Weibull distributions.

Parameters in the simulations: why SD_Sub = 033 and SD_Day = 0.166
 We have reviewed some historical data, but the information we obtained was limited due to few datasets (different animals, different pest species) and small sample size.  Note that we do not estimate the proportions of blood-fed arthropods in these datasets.  We want to estimate the variation between animals (SD_Sub) and the inflation factor (SD_Day) that cause the variation of the data greater than the variation of typical binomial distributions; therefore, in order to estimate and have reliable estimate of variation between animals and inflation factor in the variation of the binomial distributions, we need to have larger sample size datasets.
 We don't want to assume no variation between animals or days; however, we don't want to overstate these variations.  We started the SD_Sub = 0.33 and SD_Day = 0.166 to see how the distribution of true proportions due to the variations of animals or days.  Below is a table showing the 95% coverage of distributions of true proportions (of animals or days) associated with SD_Sub = 0.33 and SD_Day = 0.166.  For example, if the average proportion of blood-fed arthropods of the animals is 0.45, the 95% coverage of the distribution of true proportions of blood-fed arthropods of the animals is from 0.297 to 0.613.  
 From the 95% coverage of the distribution of true proportions of blood-fed arthropods of the animals in the table, we think a SD_Sub = 0.33 is reasonable for the variation of true blood-fed arthropod proportions of the animals.  Note that this is an additional source of variation in the data besides the variation of binomial distributions.  
 Differences among measurements of the same animals on different days can be assumed to be the variation of binomial distributions (SD_Day = 0).  However, there may be some studies that the variation in the data is higher than typical variation of binomial distribution (overdispersion).  Using a value > 0 for SD_Day parameter in the simulations would make the simulated datasets to be overdispersion.  When the true blood-fed arthropods of an animal is 0.45, the 95% coverage of proportions of blood-fed arthropods can range from 0.370 to 0.533.  The 95% coverage of the true proportions among days of an animal associated with SD_Day = 0.166 is probable acceptable (higher value for SD_Day, i.e. greater overdispersion dataset, would probably cause the non-convergence in the data).

                   Additional Source variation of variations
                         True proportion of population
                                     logit
                                      SD
                                  logit - 2SD
                                  logit + 2SD
                95% coverage of true proportions of the animals
                                    Animals
                                     0.45
                                    -0.2007
                                     0.33
                                    -0.8607
                                    0.4593
                                     0.297
                                     0.613

                                      0.6
                                    0.4055
                                     0.33
                                    -0.2545
                                    1.0655
                                     0.437
                                     0.744

                                     0.75
                                    1.0986
                                     0.33
                                    0.4386
                                    1.7586
                                     0.608
                                     0.853

                                      0.8
                                    1.3863
                                     0.33
                                    0.7263
                                    2.0463
                                     0.674
                                     0.886
                                     Days
                                     0.45
                                    -0.2007
                                     0.166
                                    -0.5327
                                    0.1313
                                     0.370
                                     0.533

                                      0.6
                                    0.4055
                                     0.166
                                    0.0735
                                    0.7375
                                     0.518
                                     0.676

                                     0.75
                                    1.0986
                                     0.166
                                    0.7666
                                    1.4306
                                     0.683
                                     0.807

                                      0.8
                                    1.3863
                                     0.166
                                    1.0543
                                    1.7183
                                     0.742
                                     0.848

 In order to have similar range from the lowest values to the median value between the random effects generated from normal distributions and Weibull distributions, we set the SubVar = 3*SD_Sub = 3*0.33 ~ 1 and DayVar = 3*SD_Day = 3*0.166 ~ 0.5.

 We also conducted simulations using SubVar = 0, 1, and 1.5 and DayVar = 0, 0.5, and 1.  The results show that the estimated proportions of blood-fed would be biased as the values of SubVar and DayVar increases.  The results show that the estimates from the simulations where SubVar = 1 and DayVar = 0.5 are reasonably close to the true values.


How the efficacy is calculated using GLMM.  Need to show examples.
 GLMM for binomial distributions with logit link function: 
 please see the section of Mathematical derivation of the estimated efficacy using a binomial model with logit link function (provided by Dr. Cohen) in the Supplemental of Sample Size Calculations of Pet Product Studies.
 GLMM for binomial distributions with log link function:
            Using a binomial distributions with the non-canonical log-link function, the probability that an arthropod is blood fed is given in the equation below:
                        Log(P(blood fed)) = μ+groupg+dayd+group*daygd
                        
            P is Probability. In this model, μ is the intercept, groupg is the effect for group g (either the treatment or the control group), dayd is the effect for day  d (1, 2, 3, or 4), and group*daygd is the group*day interaction term. The logarithms are natural logarithms. The control group is group 0 and the treatment group is group 1.
            
            For day d, define the log for the control group and treatment group by
                  α=μ+group0+dayd+group*day0d,
                  β=μ+group1+dayd+group*day1d.
            From the GLMM model in SAS, we can estimate τ=β-α and its 95% CI.
            
            Exp(τ) 	= exp(β-α) 
                  = exp(β)/exp(α)
            = P(blood fed for treatment)/P(blood fed for control)

      Efficacy = 1  -   P(blood fed for treatment)/P(blood fed for control)

            Therefore, efficacy (and its 95% CI) can be estimated using the estimates of τ (and its 95% CI)
 GLMM for Poisson distribution.
            The log-link function is used in the GLMM for Poisson distributions.  Instead of Probability of blood-fed arthropods in GLMM for binomial distribution, the rate of blood-fed arthropods is used.  Note that all the arthropods have the same exposure duration, so the "time follow-up" is set = 1 for all subject.
            
            For day d, define the log for the control group and treatment group by
                  α=μ+group0+dayd+group*day0d,
                  β=μ+group1+dayd+group*day1d.
      From the GLMM model in SAS, we can estimate τ=β-α and its 95% CI.

      Exp(τ) 	= exp(β-α) 
            = exp(β)/exp(α)
            = rate(blood fed for treatment)/rate(blood fed for control)

      Efficacy = 1  -   rate(blood fed for treatment)/rate(blood fed for control)

            Therefore, efficacy (and its 95% CI) can be estimated using the estimates of τ (and its 95% CI)
            

Higher true efficacy (95%) to increase power/reduce sample size
 In the simulations, we set the true efficacy = 0.95, 0.925, 0.90, and 0.85 on days 1, 2, 3, and 4.  A true efficacy = 0.95 is reasonably high, a true efficacy = 0.925 is slightly higher than the accepted value, a true efficacy = 0.90 is the accepted value, and a true efficacy = 0.85 is close but unaccepted value.   
 When the true efficacy = 0.90, less than 50% of the datasets are accepted (observed efficacy >= 0.90 and precision <= 0.05).  Using a higher true efficacy would reduce the sample size; however, there is a high probability that a product is rejected if its true efficacy is slightly higher than acceptable level. 
 For example, in the ticks study on dog or cat, if the true efficacy = 0.925 is used, the power = 0.877 for a study design with sample size = 14.  If the true efficacy = 0.95 is used, the power = 0.851 for a study design with sample size = 10.  However, the study design only has a power = 0.553 if the true efficacy = 0.925.


