

                          Supporting Statement Part B
                    for Information Collection Request for
A survey to improve economic analysis of surface water quality changes: Instrument, Pre-test, and Implementation
               OMB Control Number: 2090-NEW, EPA ICR #: 2588.01
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       

                                       
                               TABLE OF CONTENTS

List of Attachments	4
PART B OF THE SUPPORTING STATEMENT	28
1.	Survey Objectives, Key Variables, and Other Preliminaries	28
1(a)	Survey Objectives	28
1(b)	Key Variables	29
1(c)	Statistical Approach	31
1(d)	Feasibility	31
2.	Survey Design	32
2(a)	Target Population and Coverage	32
2(b)	Sampling Design	32
2(c)	Precision Requirements	36
2(d)	Questionnaire Design	43
3.	Pretest	45
4.	Collection Methods and Follow-up	46
4(a)	Collection Methods	46
4(b)	Survey Response and Follow-up	46
5.	Analyzing and Reporting Survey Results	47
5(a)	Data Preparation	47
5(b)	Analysis	47
5(c)	Reporting Results	52


	List of Attachments
      Attachment 1	 -  Screenshots of draft survey			
      Attachment 2	 -  Federal Register Notices
      Attachment 3 	 -  Description of statistical survey design
      Attachment 4 	 -  Responses to public comments 
      

      `							
PART B OF THE SUPPORTING STATEMENT

1.	Survey Objectives, Key Variables, and Other Preliminaries		
1(a)	Survey Objectives
	
The overall goal of the survey is to collect and analyze data that will improve the accuracy of EPA benefits-cost analysis of surface water quality changes, particularly those that rely on benefit transfer and meta-analytic approaches.  To that end EPA has developed several specific research questions that the survey has been designed to answer.
   Study objective #1: Estimate a relationship between households' values for water quality improvements and their distance from the improved resource (i.e., distance decay, extent of market).
   Study objective #2: Estimate a relationship between households' values for water quality and the amount of surface waters that are being improved.
   Study objective #3: Estimate a willingness-to-pay function that can separately represent values for human uses and values for ecosystem functions.  

To perform benefit transfers for surface water quality regulations, analysts must make either explicit or implicit assumptions about most of the issues addressed in our research objectives.  Currently, there is no comprehensive or nationally representative data available to inform those assumptions.  This survey will collect data to provide a strong empirical foundation for future analyses regarding these issues. 


1(b)	Key Variables

The key questions in the survey show respondents a region of the contiguous U.S. and ask whether they would vote for a policy that would result in the specified improvement in water quality in that region in exchange for a temporary increase in their federal, state, and local taxes.  The choice scenarios are presented as dichotomous choice questions.  Specifically, for each choice question a respondent can either choose the status quo option where the Recreation and Aquatic Biodiversity Scores remain unchanged and no additional costs are incurred by the household, or they can choose the "policy" option, where one or both of the scores improve, and some cost is incurred by the household. According to conventional economic theory, if the respondents view the survey as consequential -- that is, respondents believe that policymakers might use the survey results to help decide whether to implement the policy described in the survey -- then the respondents will choose the option that they prefer based on their preferences and budget constraints (Carson and Groves 2007, Poe and Vossler 2011). The status quo option is always available, a feature that is necessary for appropriate welfare estimation (Adamowicz et al. 1998). 
Two of the key attributes defining the options in the choice scenario are the Water Recreation and Aquatic Biodiversity scores. Inclusion of both scores addresses study objective #3, and informs study objectives #1, and #2 
The Water Recreation Score is based on the RFF Water Quality Ladder (Vaughn 1986) and corresponds to the suitability of surface waters for different types of water recreation.  Respondents are told at what values on the 100-point index experts consider waterbodies suitable for boating, fishing, and swimming.  Because the policy regions shown in each question contain many lakes, rivers, and streams, respondents are given the average score for the entire region, weighted by surface area.  The baseline Water Recreation Score for each policy region is calculated from baseline data used for the 2020 Steam Electric Power Generating Effluent Guidelines analysis (U.S. EPA 2020).   
The Aquatic Biodiversity Score is based on the ratio of the number of observed species in a waterbody to the expected number of species if the water body was in the same condition as the least disturbed water bodies in that region.  This metric is reported by ecologists as the "O/E ratio," and is calculated by dividing the number of benthic macroinvertebrate and/or plankton species observed at a site by the number expected.  The latter is based on region-specific model predictions, calibrated by ecologists based on the least disturbed waters in the corresponding eco-region. EPA conducted extensive focus group research on the appropriate indicator for aquatic biological condition. The O/E ratio was chosen because it was consistently interpreted across focus group participants and made it easy for participants to cognitively separate the improvements from those reflected in the Water Recreation Score (i.e., the WQI).
By construction, the Aquatic Biodiversity Score can range from 0 to 100.  The baseline Aquatic Biodiversity Scores for each policy region were calculated using a weighted average of the benthic macroinvertebrate O/E ratios from the 2013-2014 National Rivers and Streams Assessment and the plankton O/E ratios from the 2007 National Lakes Assessment. These data were chosen because they represent the most recent assessments for which O/E ratios were available. 
For each policy region defined in the experimental design, a separate average O/E ratio is calculated for rivers/streams and then for lakes. These calculations were performed by EPA's Office of Research and Development based on the original sampling weights in the National Aquatic Resource Surveys and the respective water surface areas corresponding to each sampling site.   The O/E ratios were then combined as a weighted average based on the corresponding surface area for each waterbody type according to the National Hydrography Dataset.  
The rest of the key variables presented in the choice questions are defined by the policy regions.  One of those variables is total surface area of lakes, rivers, and streams in a policy region.  Total surface area is given in square miles and was taken from the National Hydrography Dataset Plus (NHDPlus). The quantity of surface waters in each policy region will be used to estimate the impact of the quantity of waters improved on WTP by estimating interaction effects between surface area and improvements in the two water quality scores, addressing study objective #2 and informing study objectives #1, and #3.  
Another key variable defined by the policy region is the location of river and stream reaches and lakes within the region. These variable captures both the distance between the respondent and the improved surface waters and any place-specific characteristics of the region. These data will be used to analyze the effect of distance from the improved resource on WTP addressing study objective #1 and informs study objectives #2 and #3. Together, all variables discussed thus far will be used to estimate a valuation function capable of estimating benefits for surface water improvements at a national or regional level, addressing study objectives #1, #2, and #3. 

1(c)	Statistical Approach
A statistical survey approach in which a randomly drawn sample of households are asked to complete a survey is appropriate for estimating values associated with water quality improvements. A census approach is impractical because of the extraordinary cost of contacting all households. An alternative approach, where individuals self-select into the sample, could be subject to significant selection bias and might compromise generalizable inferences. Therefore, the statistical survey is the most appropriate approach to inform the research objectives discussed above in part B, section 1(a).
EPA retained Abt Associates Inc. (55 Wheeler Street, Cambridge, MA 02138) under EPA contract EP-C-13-039 and EP-W-17-009 to assist in the questionnaire and sampling design. 

1(d)	Feasibility
Following standard practice in the stated preference literature (Johnston et al. 1995; Adamowicz et al. 1998; Louviere et al. 2000; Bennett and Blamey 2001; Batemen et al. 2002; Johnston et al. 2017), EPA conducted a series of ten focus groups and 24 cognitive interviews (OMB control # 2090-0028). Based on findings from these activities, EPA made various modifications to the survey instrument intended to reduce the potential for respondent bias, reduce respondent cognitive burden, and increase respondent comprehension of the survey materials. In addition, EPA solicited peer review of the survey instrument by three nonmarket valuation specialists in academia, as well as input from other experts (see Section 3c in Part A). Recommendations and comments received as a part of that process have been incorporated into the design of the survey instrument and the revised survey was subsequently tested in 24 cognitive interviews.
Because of the steps taken during the survey development process, EPA anticipates that most respondents will have little difficulty interpreting and responding to the survey questions. Furthermore, since the survey will be administered using an established national-level panel, it will be accessible to all respondents and representative of the national population. EPA therefore believes that respondents will not face any obstacles in completing the survey, and that the survey will produce useful results. EPA has dedicated sufficient staff time and resources to the design and implementation of this survey. 
Funding is available to cover the expenses necessary to administer the survey as described in this supporting statement. If, however, actual costs exceed our estimates and it becomes necessary to adjust the study plan to stay within budget, EPA will reduce the sample size to complete the study. As section 2(c) on precision requirements shows, EPA will be able to achieve the primary goals of this study with a smaller sample size than planned. Some secondary objectives that require estimation of interaction effects, for example, would be estimated with less precision but still provide meaningful results. 
An electronic survey administered via the Internet shortens the data collection schedule considerably compared to a mail survey. The collection schedule provided in section 5(d) of Part A is sufficient to provide the data required for EPA's analysis for timely use in the development of our water quality benefits estimation approach. 

2.	Survey Design
2(a)	Target Population and Coverage
The U.S. adult population comprises the target population for the survey, as represented by a general-population sample that is probability-based. 
		
2(b)	Sampling Design
(i)	Sampling Frame
The sampling frame from which probability-based Internet panel members are recruited is the universe of all U.S. residential addresses, secured from the latest Delivery Sequence File (DSF) of the U.S. Postal Service. This database provides a complete listing of all residential points of delivery in the United States, regardless of telephone or internet status. Adults from sampled households are typically invited to join the panel through a series of mailings, including an initial invitation letter, a reminder postcard, and a subsequent follow-up letter. Given that a subset of physical addresses can be matched to a corresponding landline telephone number, telephone refusal-conversion calls can be made to non-responding households for which a telephone number is matched. Invited households can join the panel by:
 completing and mailing back a paper form in a postage-paid envelope,
 calling a toll-free phone number, or
 accessing a secure website to complete the recruitment form online.
Households that do not have access to the internet are provided with a web-enabled (e.g., tablet) device and Internet service provider (ISP) for survey participations. 

 (ii)	Sample Sizes
A representative sample of U.S. civilian, non-institutionalized individuals, age 18 years and older who reside in the 48 contiguous United States and the District of Columbia will be surveyed for this study. The target sample size for the pretest is 120 completed surveys. The target sample size for the main survey is 6,000 completed surveys. It should be noted that these sample sizes are large enough to accommodate precision requirements of these surveys. Details on survey estimates are discussed in section 2(c)(i).  

(iii)	Stratification Variables
The sample design does not include geographic stratification. This ensures that the sample is representative of the US population. 

(iv)	Sampling Method
Anticipating a 60% participation rate from panel members recruited to complete the survey, invitations will be sent to 10,000 randomly selected members of the panel. If the participation rate is less than 60% after non-respondents have received telephone reminders (according to Table A1) a second wave of recruitment will be initiated to achieve a final sample size of no less than 6,000 completed surveys. If the first wave of recruitment generates a sample that underrepresents some demographic groups (for example, minorities, women, less than high school diploma, etc.) or geographic regions, the second wave of recruiting will target those groups so that the final sample is representative of the population of the contiguous U.S.  

 (v)	Multi-Stage Sampling
Multi-stage sampling will not be necessary for this survey.

(vi)       Experimental Design
	The experimental design of our study follows established practices and is largely determined by several guiding principles that are described below.  

Model identification: Identification refers to the ability to obtain unbiased parameter estimates from the data for every parameter in the model.  To ensure model identification, effects in design must not confound one another, i.e., be collinear. To identify effects of interest, the experimental design must sufficiently vary the relevant attribute levels within and across choice questions and, in the case of higher-order effects, include sufficient numbers of attribute-level combinations.  

Orthogonality and balance: Orthogonality is a desirable property of experimental designs that requires strictly independent variation of levels across attributes, in which each attribute level appears an equal number of times in combination with all other attribute levels. Balance is a related property that requires each level within an attribute to appear an equal number of times (Johnson et al. 2013).  Lack of strict orthogonality does not preclude model identification and, in practice, nearly orthogonal designs are usually well identified.

Confounding: When a main effect is perfectly correlated with an interaction effect between two other variables it is not possible to identify the main effect.  This is primarily a problem with fractional factorial designs.  A full factorial design will allow us to identify all effects.  

Implausible combinations and dominated choices: Constraints that exclude implausible combinations or dominated alternatives introduce some degree of correlation and level imbalance in the experimental design.  Focus group and cognitive interview testing has shown that respondents generally accept all attribute level combinations (e.g., those that improve one aspect of water quality but not the other).  The only dominated choices that will be removed from the full factorial design are those for which there are no improvements in either water quality attribute and a positive cost.   

Several designs that adhere to these principles were developed and examined in a series of numerical simulation experiments for power analysis described in section 2(c)(iii) below. 

The levels of and correlations among some option attributes will be constrained by their associations with the pre-determined hypothetical policy regions, as described in Attachment 3. Other attribute levels will be assigned orthogonally with the aid of a full factorial design, and questions will be randomly divided among distinct survey versions (Louviere et al. 2000). Based on standard choice experiment design procedures (Louviere et al. 2000), the number of questions and survey versions were determined by, among other factors: a) the number of attributes in the final experimental design and complexity of questions, b) focus group and cognitive interview testing revealing the number of choice experiment questions that respondents are willing/able to answer in a single survey session, and c) the number of attributes that may be varied within each question while keeping the cognitive burden low enough to allow respondents to confidently identify their preferred alternative among the presented options.

Following guidance in the literature, choice sets, including variable level selection, were designed by EPA based on the goal of illustrating realistic policy scenarios that "span the range over which we expect respondents to have preferences, and/or are practically achievable" (Bateman et al. 2002, p. 259). This includes guidance regarding the statistical implications of choice set design (Hanemann and Kanninen 2001) and the role of focus groups in developing appropriate choice sets (Bennett and Blamey 2001).  Since each policy option includes an improvement in at least one water quality dimension and will involve some cost to the household relative to the status quo scenario, there will be no dominated choices in the design.  

Based on these guiding principles, the following experimental design framework is proposed by EPA. A description of the statistical design is presented in Attachment 3. The experimental design will allow for estimation of main effects based on a choice experiment framework. Each survey question includes a status quo option (no improvements from baseline and no cost) and a policy option.  Each treatment refers to a policy region for which the respondent is given information on the amount of surface water in the region (square miles) and baseline recreational and ecological water quality scores. The policy option describes the changes in the recreation and ecological scores (both the post-policy levels and the changes from the status quo levels) that would occur and the cost that would accrue to the respondent's household if the policy were implemented.  The water surface area and baseline water quality scores in the status quo option are based on actual data, so the variation in these factors is determined by the natural variation among the policy regions.

2(c)	Precision Requirements
(i)	Precision Targets
Precision of survey estimates is a direct function of sample size.  Since our primary questions of interest are dichotomous choice, we are interested in how precisely we can estimate the probability of a binary outcome.  The needed sample size n to secure a specific level of precision    can be calculated using the following formula, in which N represents the size of the population, p is the proportion, and z is the percentile of the standard normal distribution.
                                       
To obtain the most conservative estimate of the minimum sample size, p is chosen to be 50% and the size of the population is assumed to be infinite. As such, Figure B1 shows the expected margin of error, with 95% confidence, as a function of sample size. The sampling plan includes collecting 6,000 completed surveys, each of which will ask multiple dichotomous choice questions. Some hypothesis tests including those for external scope and ordering effects (Bateman et al. 2004) require estimating the model with just one response to the dichotomous choice questions.  To ensure the sample size is sufficient to perform such tests, we will assume one observation from each completed survey. Accordingly, survey estimates produced using one response from each of the 6,000 completed surveys will have error margins that are well below +-2% at 95% level of confidence.  



Figure B1 Margin of Error as a Function of Sample Size for Main Effects 

The calculations above are correct if preference heterogeneity is not considered. One way to account for preference heterogeneity is to include interaction effects that capture differences in characteristics across respondents which are often coded as binary categorical variables. To generate conservative minimum sample size requirements for interaction effects we again assume the demographic indicator variable has a mean value of 0.5 (e.g., gender) and thus the largest variance possible. Indicators for demographic characteristics with more than two categories (e.g., income, geographic region) will have smaller variance. Interaction effects between variables with equal variance require approximately four times the sample size to achieve the same level of precision (because the variance of the interaction is twice that of the main effect). If the magnitude of the interaction is half of the main effect the sample size requirement increases by a another factor of four, meaning estimating such an interaction effect with the same level of precisions requires a sample size 16 times that of a main effect.  Figure B2 shows that a sample size of 6,000 is capable of estimating the interaction effect with an error margin of approximately +-5 with 95% confidence.  Greater precision can be achieved by using more than one dichotomous choice response per survey, which we intend to do for our main analysis. Given the conservative assumptions in these calculations, our sampling plan is sufficient to estimate main effects and interaction effects that capture preference heterogeneity using demographic, behavioral, and attitudinal data. 

Figure B2 Margin of Error as a Function of Sample Size for Interaction Effects

Power analysis
Power analysis for sample size calculations is an important aspect of scientific survey sampling, since without such calculations the employed sample sizes may be inadequate for achieving sufficiently precise estimates of key quantities of interest.  Equally important, without power analysis a study can be based on costly sample sizes far larger than what precision requirements dictate.  As such, power analyses guard a study from failure in detecting important differences as well as cost overruns due to unnecessarily large samples.
In broad terms, most statistical inferences involve tests of hypotheses about population parameters based on observed data from samples.  Analogous to any other decision-making process, challenging the null hypothesis in favor of the alternative will be subject to two types of errors.  As illustrated in the following table, these two types of error -- the frequencies of which are conventionally denoted by  and  -- correspond to rejecting a true null hypothesis and failing to reject a false null hypothesis, respectively.
Table B1 Depiction of Type I and Type II Errors
                                       
                                     Truth
                                       
                                      H0
                                      H1
                        Inference Based on Survey Data
                                      H0
                          Null true and not rejected
                               Type II Error ()
                                       
                                      H1
                                Type I Error ()
                            Null false and rejected

Considering the above inferential infrastructure, for any empirically informed decision-making process it is desirable to keep both types of error as small as possible.  For instance, for most hypothesis testing applications it is desirable for the chance of Type I error () to be at or below 5% with 95% confidence.  In contrast, sample size requirements related to Type II error () are often stated in terms of power of the test, which is 1-.  This quantity represents the probability of rejecting the null hypothesis when it is false.
Because there are many forms of statistical tests, there are numerous scenarios for power analysis. However, a common approach for sample size calculation is to require a minimum power level for detection of a desired Effect Size () between the two population parameters. For instance, it is often the case that the probability of correctly rejecting the null hypothesis (power of the test) is set at 0.8 for detecting a significant difference between two population parameters. Naturally, the smaller the minimum detectable difference () is expected to be, the larger sizes will be required for both samples.  
For illustration purposes, the following figure shows the required sample size per group so that a difference of δ= 5% between two population proportions can be detected as significant with a power of 0.8, using a one-sided Fisher's Exact Conditional Test for two proportions.  Moreover, the required sample sizes per group are calculated for various possible values for the first population proportion, ranging from 5% to 90%.  As expected, the required sample size increases as the two proportions under consideration get close to 50%.
                                       
Figure B1 Required Sample Size Per Group to Detect  = 5% with 0.8 Power

Accordingly, with an anticipated total sample size of 6,000 for this survey, there will be sufficient power for detection of significant differences. This would be the case even for survey estimates that exhibit high levels of sampling variability, such as estimates of proportions that are close to 50%. Moreover, individual survey estimates are expected to carry small error margin, as discussed at length in Section 2c. In summary, the chosen sample size for this survey is large enough to provide parameter estimates with modestly narrow confidence intervals, as well as detect significant differences when various subgroups are compared to each other.

Supplemental Power Analysis for Choice of Experimental Design
We conducted a supplemental power analysis based on the specific form of the estimating equation we intend to use (described in section 5.2 below) and a representative target of estimation, average household willingness to pay for a benchmark water quality improvement. This power analysis required simulating many possible datasets given our sampling design, survey format, and prior assumed values for all parameters to be estimated (de Bekker-Grob et al. 2015). 
For purposes of simulating the test data, we used 49 geographic zones based on the 48 contiguous states plus the District of Columbia. Each zone was normalized to have the same total surface area, 63,682.2 sq mi, such that the sum of the zones equals the total land area of the continental U.S. The centroid of each zone was set equal to the centroid of the corresponding state, and the area of surface water was set equal to the areal density of surface water of the corresponding state times the surface area of the zone. In this way we constructed 49 geographic zones of the same size with a surface water quantity distribution and spatial pattern similar to that of the continental U.S. 
We cross tabulated the zones with the policy regions used in our survey design by assigning zone z to policy region j if the centroid of zone z was contained in policy region j. We also used the zones as the areas from which we randomly drew simulated survey respondents. The population of each zone was set equal to the corresponding state population. The simulated set of respondents was constructed by drawing randomly from the zones using probability weights proportional to the population in each zone, so each household had the same probability of being selected into the sample. Respondent incomes were drawn from a lognormal distribution calibrated to match average incomes of the bottom 99% and the top 1% of incomes in each respondent's home state.  
We simulated the choice data using the parameter values shown in the first column of numbers in Table B2 below. These parameter values were selected to produce an average WTP for a 1 percentage point improvement in both scores for all surface water bodies in all zones close to $20 per year, which we view as a plausible central estimate considering previous water quality valuation studies and focus group findings. We set the scale parameter such that the pseudo-R2 value for the estimated model is between 0.25 and 0.35.
We examined four experimental designs that vary the number of attribute levels, policy regions, and number of questions on each survey such that each design is balanced and orthogonal.  We also tested different distributions of attribute levels within the maximum range and found that even distributions produced more efficient designs than cases where some levels were clustered together.  Table B1 below shows the simulated standard errors for each estimated parameter of the model.  
	
      Table B2. Power Analysis Results for Candidate Survey Designs. 
                                       
                                       
                           Estimated Standard Errors
                                   Parameter
                                  True value
                                   3 levels
                                  6 questions
                                   6 regions
                                   4 levels 
                                  4 questions
                                   8 regions
                                   5 levels 
                                  5 questions
                                   5 regions
                                   6 levels 
                                  6 questions
                                   6 regions
                                      α
                                      7.0
                                    3.1252
                                    4.7737
                                    3.9421
                                    4.1291
                                      β
                                      0.5
                                    0.0198
                                    0.0278
                                    0.0245
                                    0.0267
                                      θ
                                      0.5
                                    0.0466
                                     0.066
                                    0.0632
                                    0.0654
                                      η
                                      0.5
                                    0.1822
                                    0.2934
                                    0.2346
                                    0.1986
                                      φ
                                      0.1
                                    0.0066
                                    0.0096
                                    0.0102
                                     0.01
                                      γ
                                    0.0069
                                    0.0006
                                    0.0007
                                    0.0008
                                    0.0008
                                      δ
                                      0.5
                                    0.0141
                                    0.0218
                                    0.0205
                                    0.0221
                                      τ
                                      1.0
                                    2.1292
                                    2.9658
                                    2.3274
                                    2.4534
                                      σ
                                     0.05
                                    0.0008
                                     0.001
                                     0.001
                                    0.0011
                            Estimated Values of WTP
                                     WTP+1
                                     ~$20
                                     20.77
                                     20.79
                                     20.68
                                     20.77
                                    SE(WTP)
                                       
                                    0.3924
                                     0.604
                                    0.4434
                                    0.4515

To estimate the standard errors of the maximum likelihood parameter estimates for each candidate study design, we simulated the survey response data 100 times and for each iteration we calculated the gradient of the log likelihood function at the true parameter values, g0. We then took the average of the outer product of the gradients over the 100 simulated datasets to estimate their expected value, Eg0'g_0, which corresponds to the Fisher information matrix by the "information matrix equality" (e.g., Greene 2012 p 557). The inverse of this matrix is an estimate of the variance-covariance matrix of the maximum likelihood parameter estimates, V, and the square roots of the diagonal elements of this matrix are estimates of the standard errors of the maximum likelihood parameter estimates. 
Our estimates of the standard errors for each parameter under each survey design configuration are shown in the corresponding columns of Table B1. We used the delta method to estimate the standard error of the average sample willingness to pay for a one percentage point increase in both water quality scores in all geographic zones, i.e.,  seWTP+1=h'Vh, where h is a vector of first derivatives of WTP+1 with respect to each parameter. These estimates are shown in the final row of Table B1.
The results in Table B1 provide several indications about the relative efficiency of the candidate survey design configurations that we examined.  All designs result in relatively accurate and efficient estimates of WTP for a 1% improvement in all attributes with the 3-level, 6-question design producing the smallest standard error.  Likewise, the standard errors on all parameter estimates are the lowest using that design.  Intuitively, the design with 6 questions and 3 levels for each attribute is the most efficient because it collects the most information from each respondent (6 valuation questions) while maximizing the distance between attribute levels which minimizes the likelihood of the model predicting the wrong response given the idiosyncratic error associated with each estimated effect.  

(ii)	Non-Sampling Errors
A variety of non-sampling errors may be encountered in stated preference surveys. Coverage error occurs when some individuals in the population have no chance of being selected. For the current survey, the generalizable population is U.S. civilian, non-institutionalized individuals, age 18 years and older who reside in the 48 contiguous United States and the District of Columbia.  Address-Based Sampling (ABS) methodology is used to recruit members into probability-based Internet panels.  The sampling frame from which panel participants are recruited is the universe of all contiguous U.S. residential addresses, secured from the latest Delivery Sequence File (DSF) of the U.S. Postal Service. This database provides a complete listing of all residential points of delivery in the United States, regardless of telephone or internet status, thus the risk of coverage error is minimal.
Non-response bias is another type of non-sampling error that can potentially occur in stated preference surveys. Non-response bias can occur when households do not participate in a survey or do not answer all relevant questions on the survey instrument (item non-response). EPA has designed the survey instrument to maximize the response rate and will follow Dillman et al.'s (2014) web-based survey approach (see Subsection 4(b) for details). To determine whether there is any evidence of significant non-response bias in the completed sample, EPA will conduct a non-response bias analysis using available data. This will enable EPA to identify potential differences between respondents to the web survey and those who received a URL but did not complete it.

Nonresponse Bias Study
When a sizable percentage of sampled individuals choose not to respond to a survey, it is advisable to conduct analyses that can assess the potential magnitude of nonresponse bias on survey estimates (Fahimi 2004).  We intend to conduct a non-response analysis to quantify and ultimately address this issue in our estimation of WTP. We understand that any single method might be insufficient to address all possible forms of non-response bias (Groves 2006 and Groves et al. 2006). Therefore, we intend to follow the relevant recommendations in OMB's guidelines (Groves 2006, Montaquila and Olson 2012, Halbesleben and Whitman 2013). We will conduct two types of non-response bias analysis: benchmarking and response propensity analysis.
	Benchmarking involves comparing data collected from our sample with other data collection efforts.  We will compare the geodemographic distributions of our starting sample and the resulting respondents against those of the target population using the latest information from the Current Population Survey (CPS).  While socio-demographic representativeness is informative, non-response bias analysis can be improved by benchmarking our data collection to other variables that may be more closely related to WTP for water quality improvements.  Our survey instrument includes questions on respondents' water-based recreation activities and frequency that were taken directly from the U.S. Forest Service's National Survey on Recreation and the Environment and the U.S. Fish and Wildlife Service's National Survey of Fishing, Hunting, & Wildlife-Associated Recreation. This will allow for direct comparison of our survey sample to nationally representative samples from these broader benchmark surveys. This will provide further insight into any potential nonresponse biases since user frequency and recreational activities are likely correlated with WTP. 
EPA will also conduct response propensity analysis, for which one first estimates the propensity to respond to a survey as a function of variables that are available for both respondents and non-respondents, as described by Johnston and Abdulrahman (2017) and Cameron and DeShazo (2013).  In a second stage one then interacts the predicted propensity to respond with each of the environmental attributes. The coefficients corresponding to these interaction terms will capture any systematic differences in preferences across households that have higher or lower propensity to respond to the survey.  Response propensity analysis requires data on households that were contacted to complete a survey but did not participate.  Probability-based internet panels maintain extensive data on the panel and will provide the means for a robust response propensity analysis.  Most internet panel proprietors also collect some data from households that chose not to join the panel which will allow EPA to examine response propensity at the two critical points of non-response. This approach provides insight towards any unobserved self-selection bias and, if present, a means to correct for it. 

Sample Weighting 
Since we are not stratifying our sample, every household in the sample frame has the same probability of being sampled.  As such, post-stratification weighting is not necessary.  While we expect our sample to be representative of the generalizable population, if the non-response bias analysis reveals that households with certain characteristics are underrepresented in our sample and those characteristics are correlated with responses to the valuation questions, EPA will perform statistical corrections to weight the observations. EPA will present the unweighted and the corrected valuation estimates for comparison.  
Specifically, these statistical corrections will likely entail iterative proportional fitting, also known as calibration or raking. Raking is the most prevalent form of statistical correction in public opinion surveys, and prior research has shown it performs similarly to other related methods in reducing possible non-response bias (Dutwin and Buskirk, 2017; Pew Research Center, 2018). To implement an iterative proportional fitting procedure in our context, EPA will select the set of characteristics for which our surveyed sample population is not representative of the general population and the characteristic is correlated with valuation responses. For these selected characteristics, EPA will adjust the weight of responses across all cases of the variable until the sample population corresponds to the general population. For example, if more males than females respond to the survey than would be expected from the general population, and gender is also correlated with valuation, then EPA would down-weight male responses until the sample distribution aligns with the proportion of males and females in the general population. If multiple characteristics require this proportional fitting, then EPA will iteratively re-weight the first characteristic after each subsequent characteristic until all characteristics are similarly distributed in the survey and the general population. 
 
2(d)	Questionnaire Design
An overview of each section of the survey and details of the information requested by the survey are discussed in Section 4(b)(i) of Part A of this supporting statement. The full texts of the draft questionnaire is provided in Attachment 1. Several categories of questions are included in the survey. The reasons for including each of these categories are discussed below:
Questions about visits to waterbodies (screens 3 and 4). A series of questions regarding visits and recreational use of waterbodies are presented. These questions are meant to prime respondents for thinking about surface water quality and to provide data that can be used to compare to other national surveys, which serve as a benchmark to assess the representativeness of our sample.  Question 1 asks if the respondent has taken a trip to a lake, river, or stream in the last 12 months and will be used to identify users of lakes, rivers, and streams in the data analysis.  Question 2 asks if they have gone fishing in freshwater. These questions were borrowed directly from the National Survey of Fishing, Hunting, and Wildlife-Associated Recreation which is a national survey conducted by the U.S. Fish and Wildlife Service and the U.S. Census Bureau.  Question 3 asks how many single-day trips to a river, lake, or stream the respondent has taken in the past 12 months.  Question 4 asks for the main purpose of the last trip taken.  Question 5 asks how many miles the respondent traveled for the last single-day trip they took to a lake, river, or stream.  Questions 3, 4, and 5 also appear on the National Survey on Recreation and the Environment, which is a national telephone survey.  Including these questions on the survey described in this ICR will provide a comparison to other national surveys administered via different modes and using different sampling strategies in order to assess the representativeness of our sample regarding characteristics that are key to our main study objectives.  

Choice questions (screens 19 through 28). The questions in this section are the key component of the survey.  Respondents' choices among alternatives with specific water quality improvements in a given region and household cost increases are the main data that allow estimation of willingness to pay, distance decay, and the impact of quantity on WTP. Respondents will be presented with a series of six choice question scenarios.  Responses to these choice questions are of primary interest to this study. Each choice scenario entails a status quo option where the Recreation and Aquatic Biodiversity Scores remain unchanged and no additional costs are incurred by the household, and a "policy" option, where one or both scores improve, and some positive cost is incurred by the household. Following standard consumer theory, respondents will choose the option that they prefer based on their preferences. The status quo option is always available, something that is necessary for appropriate welfare estimation (Adamowicz et al. 1998). Following standard approaches (Opaluch et al. 1993, 1999; Johnston et al. 2002a, 2002b, 2003), each choice question is separated by a reminder to consider each scenario independently, disregard previous questions, and to not add up costs or benefits across scenarios.  This is included to avoid biases associated with sequence aggregation effects (Mitchell and Carson 1989). 

Debriefing questions (screens 29 through 35) These questions ask respondents about their motivations for choosing certain choice options over others, and whether they accepted the hypothetical scenario when making their choices. These questions will help to identify respondents who incorrectly interpreted the choice questions or did not believe the choice scenarios to be credible. In other words, the responses to these questions will be used to identify potentially invalid responses, such as: protest response (e.g., protest any government program), scenario rejection, omitted variable considerations (e.g., economic and employment impacts), and symbolic (warm glow) responses (e.g., want a better environment in general). The responses to some of the questions will also be used to identify motivations behind respondents' choices, including altruism, option value, bequest, etc.  Five-point Likert scale response formats are used for all debriefing questions which allows a number of analysis approaches.  If the responses are used to screen some observations from the sample (e.g. because they indicate scenario rejection) sensitivity analysis can be performed by using different values on the scale as the critical value that causes an observation to be flagged and dropped from the sample. Responses can also be treated as categorical variables and treated parametrically in the data analysis. Finally, questions on screens 35 through 41 inquire about employment, housing tenure, and language spoken.  Responses to this final set of questions are directly comparable to questions in the U.S. Census Bureau's American Community Survey, and thus provide additional variables for assessing the representativeness of the sample. 
    
Demographic Questions (screens 39 through 42) These questions collect data that will be used to conduct our non-response analysis.  

3.	Pretest
EPA intends to implement this study in two stages: a pretest and a main survey. First, EPA will administer the pretest to obtain a total of 120 completed surveys. The purpose of the pretest is to confirm the survey length and to quality control the testing of the survey instrument and survey data. 

4.	Collection Methods and Follow-up
4(a)	Collection Methods
The survey will be administered to members of the Internet panel who are selected to participate in the study. Several design features necessary to achieve our research objectives can only be included in an electronic survey.  Part A Section 4(a) describes several advantages a probability-based Internet panel has over other survey administration modes. 

Panelists will take the surveys on a variety of devices such as desktops, laptops, tablets, and smartphones. Once panelists are assigned to the survey, they will receive a notification email advising them there is a new survey available for them to take. This email notification will contain a link that sends them the survey questionnaire. No login name or password will be required. The field period for the survey questionnaire will be about six weeks.  

4(b)	Survey Response and Follow-up
The estimated participation rate for the main survey is 60% which is typical for probability-based internet panels.  To obtain the highest cooperation rate possible, an advance email will be sent to potential respondents one week before the survey is available.  Potential respondents will receive another notification email when the survey is available including a unique link to their survey.  No login information or password is required with the unique link to minimize attrition.  Those who have not competed the survey will receive a reminder email two weeks after the survey has gone live.  Three days after that, respondents who have not completed the survey will receive a phone call reminder.  



5.	Analyzing and Reporting Survey Results
5(a)	Data Preparation

Since the survey will be administered on the Internet, survey responses will be automatically entered into an electronic database as surveys are completed by each respondent. After all responses have been recorded and double-checked for accuracy, the database contents will be converted into a format suitable for statistical analysis and delivered to the EPA by the contractor. 

5(b)	Analysis
Once the survey data have been checked for errors, cleaned, and assembled into a data file, they will be analyzed using statistical analysis techniques. The following section discusses the models that the EPA intends to use to analyze the survey responses.

Analysis of Stated Preference Data
The basic strategy for analyzing stated preference data is grounded in the standard random utility model of Hanemann (1984) and McConnell (1990). The random utility model assumes that respondents choose the option that would provide the highest utility in each choice scenario. This model is applied extensively within stated preference research, and allows well-defined welfare measures (e.g., willingness to pay) to be derived from choice experiment models (Bennett and Blamey 2001, Louviere et al. 2000). In the standard random utility model applied to choice experiments, hypothetical choice scenario options are described in terms of attributes that focus groups reveal as relevant to respondents' utility, or well-being (Johnston et al. 1995; Adamowicz et al. 1998; Opaluch et al. 1993). One of these attributes is a monetary cost to the respondent's household. 

Applying this standard model to choices among hypothetical policies that would improve water quality throughout the specified region in the continental United States (following the format described in Section B.2(d) of this supporting statement), a standard utility function Ui∙ includes the respondent's household income and the quantity and quality of the environmental resources that may be affected by the hypothetical policies described in the survey. Following standard random utility theory, utility is assumed known to the respondent, but stochastic from the perspective of the researcher, such that:

 (1)			Ui∙=UXi,D,Y-Fi=vXi, D,Y-Fi+εi,

where Xi is a vector of variables describing attributes of option i; D is a vector characterizing demographic and other attributes of the respondent; Y is the disposable household income of the respondent; Fi is the mandatory additional cost faced by the household under option i; v∙ is a function representing the empirically estimable component of utility; and ε - i is the unobservable component of utility, which is modeled as a stochastic error term.
A model of such a preference function is estimated by econometric methods designed for limited dependent variables, because researchers only observe each respondent's choice between or among two or more options rather than observing values of Ui∙ for each option directly (Maddala 1983; Hanemann 1984). Standard random utility models are based on the probability that a respondent's utility from program i, Ui∙, exceeds the utility from alternative programs j, Uj∙, for all potential programs j!=i considered by the respondent. In this case, the respondent's choice set of potential programs also includes maintaining the status quo. 
When faced with K distinct options, the respondent will choose the option with the highest expected utility. Drawing from (1), the respondent will choose program i if:

 (2)			vXi,D,Y-Fi+εi>=vXj,D,Y-Fj+εj  ∀  j!=i .	

If the ε's are assumed independently and identically drawn from a type I extreme value (Gumbel) distribution, the model may be estimated as a conditional logit model, as detailed by Maddala (1983) and Greene (2012). This model results in an empirical estimate of the systematic component of utility v∙, based on observed choices among different options. Based on this estimated function, welfare measures (e.g., willingness-to-pay) can be calculated following the well-known methods developed by Hanemann (1984) and summarized by Freeman (2003). Following standard choice experiment methods (Adamowicz et al. 1998; Bennett and Blamey 2001), each respondent will be presented with questions including two options (i.e., Option A [status quo], Option B) and asked to choose their most preferred option. Following clear guidance from the literature, a "no further action" or status quo option is always included in the choice set, to ensure that WTP measures are well-defined (Louviere et al. 2000).
Six choice questions will be included in each survey to increase the amount of information obtained from each respondent. Presenting each respondent with multiple choice scenarios is standard practice in choice experiment and dichotomous choice contingent valuation surveys (Poe et al. 1997; Layton 2000) but requires allowing for potential correlations among responses by a single respondent. That is, while responses across different respondents are independent, the set of responses provided by any individual respondent may be correlated (Poe et al. 1997, Layton 2000, Train 1998). A common approach to accommodate such potential within-respondent correlations is to account for preference heterogeneity using random parameters, which leads to a mixed logit modeling framework (Poe et al. 1997, McFadden and Train 2000, Layton 2000, Greene 2003). Such models can be estimated using simulated maximum likelihood methods, and the performance of alternative specifications can be assessed using standard statistical measures of model fit, as described by Train (1998), Greene (2002), and others.

Econometric Specification
One objective of our study is to use the data collected from the survey to estimate a household willingness to pay function suitable for a wide range of regional or national policies that might affect water quality in lakes, rivers, and streams across the US. We specified the WTP function to accommodate several features that characterize people's preferences for water quality improvements according to published literature and our consultations with the potential respondents and subject matter experts.  Those features include differential values for outdoor recreation activities and aquatic biodiversity support, distance decay, diminishing marginal willingness-to-pay, and imperfect substitution between water quality and quantity. We developed the survey design with these objectives in mind, so that the data we collect will allow us to identify and have sufficient power to estimate model parameters that represent these features with an acceptable level of precision.
Towards these ends, the most general model we intend to estimate will be based on the following indirect utility function:

(3)
Vi=Yiβ+αGiβ1/β


(4)
Gi=j=1JqjηQjθϕ+max0,1-γxij 


(5)
qj=δqjR+1-δqjB


where Yi is income and Gi is a function indicating overall water quality for respondent i; j indexes uniform grid cells that cover the continental U.S.; qj is  a weighted average of recreational and ecological water quality and Qj is the quantity of surface water in cell j; x - ij is the distance between respondent i and cell  - j; and qjR and qjB are the average recreation and biodiversity scores in cell j.
In this framework, there are at least 7 parameters to be estimated (plus additional parameters for any household demographic attributes included in the model): α, β, θ, η, ϕ, γ, and δ. The role of each parameter is as follows: α determines the general magnitude of WTP for improvements in overall water quality; β controls the rate of substitution between income and overall water quality and so will control the rate at which marginal WTP for water quality improvements declines; η and θ control the marginal rate of substitution between, and marginal rates of return to, water quantity and quality; ϕ and γ (both constrained to be positive) determine the shape of the distance decay function (if ϕ>0 then some portion of WTP is independent of distance, possibly due to nonuse value; and γ is the slope of the linear decay function); and δ (constrained to lie between 0 and 1) is the weight placed on the recreation score relative to the biodiversity score.
The indirect utility function set out above is highly nonlinear, and so the parameters of the model cannot be estimated using a standard main-effects-only estimating equation where the indirect utility of each option is a simple linear function of the unknown parameters. Non-linear specifications can be more computationally challenging to estimate in practice, therefore we will begin with a simpler model that can be viewed as a special case of the more general model outlined above. Imposing the restrictions β=η=θ=1 produces the following simplified functional form:

(6)
Vi=Yi+αj=1JQ - jδqjR+1-δqjW ϕ+max0,1-γxij 


This simplification leads to a standard multinomial logit specification where each non-status quo option includes as arguments the cost of the option and the quantity- and distance-weighted changes in the two water quality scores in the policy region, jδΔqjR+1-δΔqjWQjϕ+max0,1-γxij1ij, where 1ij is an indicator variable equal to 1 if cell j is in the policy region presented to respondent i. In this case the willingness to pay function would be:
(7)
WTPi=αΔGi=αj=1JQjδΔqjR+1-δΔqjWϕ+max0,1-γxij1ij


so, assuming errors are independent and Gumbel distributed, the probability that respondent i would choose option k is:

(8)
pik=expσαΔGik-ckm=1KexpσαΔGim-cm .


We note that by using WTP as the latent variable, we can directly estimate the scale parameter, σ, as the coefficient on the cost variable, similar to Cameron and James (1987).
A straightforward way to accommodate individual or household-level heterogeneity would be to replace α with αi=XiΩ, where  - Xi is a vector of individual or household attributes. It also may be possible to include a random component in αi in a mixed logit framework. 
We intend to estimate the simpler WTP function in (7) and the more general WTP function implied by equations (3)-(5) using standard multinomial logit, mixed logit, and latent class modeling approaches (Train 2009). We will begin with the simplest functional form, based on equation (7), in the simplest estimation framework, the standard multinomial logit model. We will then attempt to estimate more general models that can accommodate unobserved preference heterogeneity, the mixed logit and latent class models, and more general WTP functions by relaxing one or more of the restrictions that separate (3)-(5) from (7). We also will consider other functional forms that may not fit neatly into the general framework set out above. For example, it may be possible to estimate exponential or step distance decay functions rather than the linear distance decay functional form indicated in equation (4). We also may examine estimating equations with imperfect substitutability between the recreation and biodiversity scores. If multiple functional forms fit the data nearly equally well, then we will use formal model selection or model averaging techniques to synthesize the results.


5(c)	Reporting Results

The results of the survey will be made public via open-access publications in peer reviewed journals and working papers posted on EPA's website.  Provided information will include summary statistics for the survey data, extensive documentation for the statistical analysis including all data and code used for estimation, and a detailed description of the final results. Given the breadth of our research objectives the statistical approach and relevant data will vary among outlets.  The survey data will be released upon request only after it has been thoroughly vetted to ensure that all potentially identifying information has been removed.


REFERENCES 
Adamowicz, W., Boxall, P., Williams, M. and Louviere, J., 1998. Stated preference approaches for measuring passive use values: choice experiments and contingent valuation. American Journal of Agricultural Economics, 80(1), pp.64-75.
Anderson, G.D. and Edwards, S.F., 1986. Protecting Rhode Island's coastal salt ponds: an economic assessment of downzoning. Coastal Management, 14(1-2), pp.67-91.
Bateman, I.J., Mace, G.M., Fezzi, C., Atkinson, G. and Turner, R.K., 2014. Economic analysis for ecosystem service assessments. In Valuing Ecosystem Services. Edward Elgar Publishing.
Bateman, I.J., Cole, M., Cooper, P., Georgiou, S., Hadley, D. and Poe, G.L., 2004. On visible choice sets and scope sensitivity. Journal of Environmental Economics and Management, 47(1), pp.71-93.
Bateman, I.J., Carson, R.T., Day, B., Hanemann, M., Hanley, N., Hett, T., Jones-Lee, M., Loomes, G., Mourato, S., Pearce, D.W. and Sugden, R., 2002. Economic valuation with stated preference techniques: a manual. Economic Valuation with Stated Preference Techniques: a Manual.
Bennett, J. and Blamey, R. eds., 2001. The Choice Modelling Approach to Environmental Valuation. Edward Elgar Publishing.
Bishop, R.C., Boyle, K.J., Carson, R.T., Chapman, D., Hanemann, W.M., Kanninen, B., Kopp, R.J., Krosnick, J.A., List, J., Meade, N. and Paterson, R., 2017. Putting a value on injuries to natural assets: The BP oil spill. Science, 356(6335), pp.253-254. 
Cameron, T.A. and DeShazo, J.R., 2013. Demand for health risk reductions. Journal of Environmental Economics and Management, 65(1), pp.87-109. 
Carson, R.T. and Groves, T., 2007. Incentive and informational properties of preference questions. Environmental and Resource Economics, 37(1), pp.181-210.
Carson, R.T. and Mitchell, R.C., 1995. Sequencing and nesting in contingent valuation surveys. Journal of Environmental Economics and Management, 28(2), pp.155-173.
de Bekker-Grob, E.W., Donkers, B., Jonker, M.F. and Stolk, E.A., 2015. Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient-Patient-Centered Outcomes Research, 8(5), pp.373-384. 
Desvouges, W.H. and Smith, K.V., 1986. The Conceptual Basis of Benefits Estimation in Measuring Water Quality Benefits. Ray Perryman, ed.
Dillman, D.A., Smyth, J.D. and Christian, L.M., 2014. Internet, Phone, Mail, and Mixed-mode Surveys: the Tailored Design Method. John Wiley & Sons. 
Dillman, D.A., Phelps, G., Tortora, R., Swift, K., Kohrell, J., Berck, J. and Messer, B.L., 2009. Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet. Social Science Research, 38(1), pp.1-18.
Dutwin, D. and Buskirk, T.D., 2017. Apples to Oranges or Gala versus Golden Delicious? Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples. Public Opinion Quarterly, 81(S1), pp. 213 - 239.
Greene, W.H., 2012. Econometrics Analysis. Seven Edition. 
Greene, W.H. and Hensher, D.A., 2010. Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation, 37(3), pp.413-428.
Groves, R.M., 2006. Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), pp.646-675.
Halbesleben, J.R. and Whitman, M.V., 2013. Evaluating survey quality in health services research: a decision framework for assessing nonresponse bias. Health Services Research, 48(3), pp.913-930. 
Hanemann, W.M., 1984. Welfare evaluations in contingent valuation experiments with discrete responses. American journal of agricultural economics, 66(3), pp.332-341. 
Hanemann, M. and Kanninen, B., 2001. 11 The Statistical Analysis of Discrete-Response CV Data147. In Valuing environmental preferences: theory and practice of the contingent valuation method in the U.S., EU, and developing countries (Vol. 302). Oxford University Press on Demand.
Hanemann, W. Michael. Some further results on exact consumer's surplus. No. 1557-2016-132808. 1981. 
Hensher, D.A. and Ho, C., 2016. Identifying a behaviourally relevant choice set from stated choice data. Transportation, 43(2), pp.197-217. 
Hensher, D.A., Rose, J.M. and Greene, W.H., 2012. Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design. Transportation, 39(2), pp.235-245. 
Johnson, F.R., Lancsar, E., Marshall, D., Kilambi, V., Mühlbacher, A., Regier, D.A., Bresnahan, B.W., Kanninen, B. and Bridges, J.F., 2013. Constructing experimental designs for discrete-choice experiments: report of the ISPOR conjoint analysis experimental design good research practices task force. Value in health, 16(1), pp.3-13.
Johnston, R.J. and Abdulrahman, A.S., 2017. Systematic non-response in discrete choice experiments: implications for the valuation of climate risk reductions. Journal of Environmental Economics and Policy, 6(3), pp.246-267. 
Johnston, R.J., Swallow, S.K. and Bauer, D.M., 2002. Spatial factors and stated preference values for public goods: considerations for rural land use. Land economics, 78(4), pp.481-500.
Johnston, R.J., Boyle, K.J., Adamowicz, W., Bennett, J., Brouwer, R., Cameron, T.A., Hanemann, W.M., Hanley, N., Ryan, M., Scarpa, R. and Tourangeau, R., 2017. Contemporary guidance for stated preference studies. Journal of the Association of Environmental and Resource Economists, 4(2), pp.319-405.
Johnston, R.J., Besedin, E.Y. and Wardwell, R.F., 2003. Modeling relationships between use and nonuse values for surface water quality: A meta‐analysis. Water Resources Research, 39(12).
Johnston, R.J., Swallow, S.K., Allen, C.W. and Smith, L.A., 2002. Designing multidimensional environmental programs: Assessing tradeoffs and substitution in watershed management plans. Water Resources Research, 38(7), pp.4-1.
Johnston, R.J., Swallow, S.K. and Weaver, T.F., 1999. Estimating willingness to pay and resource tradeoffs with different payment mechanisms: an evaluation of a funding guarantee for watershed management. Journal of Environmental Economics and Management, 38(1), pp.97-120.
Johnston, R.J., Weaver, T.F., Smith, L.A. and Swallow, S.K., 1995. Contingent valuation focus groups: insights from ethnographic interview techniques. Agricultural and Resource Economics Review, 24(1203-2016-94999), pp.56-69.
Johnston, R.J., Besedin, E.Y. and Stapler, R., 2017. Enhanced geospatial validity for meta-analysis and environmental benefit transfer: an application to water quality improvements. Environmental and Resource Economics, 68(2), pp.343-375.
Layton, D.F., 2000. Random coefficient models for stated preference surveys. Journal of Environmental Economics and Management, 40(1), pp.21-36. 
Louviere, J.J., Hensher, D.A. and Swait, J.D., 2000. Stated choice methods: analysis and applications. Cambridge university press.
Maddala, G.S., 1983. Methods of estimation for models of markets with bounded price variation. International Economic Review, pp.361-378. 
McConnell, K.E., 1990. Models for referendum data: the structure of discrete choice models for contingent valuation. Journal of environmental economics and management, 18(1), pp.19-34. 
McFadden, D. and Train, K., 2000. Mixed MNL models for discrete response. Journal of applied Econometrics, 15(5), pp.447-470. 
Mitchell, R. and Carson, R., 1989. Using surveys to value public goods. Resources for the Future. Washington, DC.
Montaquila, J.M. and Olson, K.M., 2012. Practical tools for nonresponse bias studies. SRMS/AAPOR Webinar, 24. 
Newbold, S.; Walsh, P.J.; Massey, D.M and Hewitt, J. 2018. "Using Structural Restrictions to Achieve Theoretical Consistency in Benefit Transfers," Environmental and Resource Economics 69:529-553.
Pew Research Center, 2018. "For Weighting Online Opt-In Samples, What Matters Most?"
Poe, G.L., Welsh, M.P. and Champ, P.A., 1997. Measuring the difference in mean willingness to pay when dichotomous choice contingent valuation responses are not independent. Land economics, pp.255-267. 
Schkade, D.A. and Payne, J.W., 1994. How people respond to contingent valuation questions: a verbal protocol analysis of willingness to pay for an environmental regulation. Journal of Environmental Economics and Management, 26(1), pp.88-109.
Smith, V.K. and Desvouges, W.H., 1988. Contingent Valuation Methods and the Valuation of Environmental Risk. draft, Resource and Environmental Economics Program, North Carolina State University at Raleigh.
Train, K.E., 1998. Recreation demand models with taste differences over people. Land economics, pp.230-239. 
Train, K.E., 2009. Discrete choice methods with simulation. Cambridge university press. 
EPA, U., 2010. Guidelines for preparing economic analyses. EPA 240-R-10-001. Washington, DC, US Environmental Protection Agency.
U.S. EPA, 2015. Benefit and Cost Analysis for the Effluent Guidelines and Standards for the Steam Electric Power Generating Point Source Category. September 2015. https://www.epa.gov/sites/default/files/2015-10/documents/steam-electric_benefit-cost-analysis_09-29-2015.pdf
U.S. EPA, 2020. Benefit and Cost Analysis for Revisions to the Effluent Limitations Guidelines and Standards for the Steam Electric Power Generating Point Source Category. August 28, 2020. https://www.epa.gov/sites/default/files/2020-08/documents/steam_electric_elg_2020_final_reconsideration_rule_benefit_and_cost_analysis.pdf
Van Houtven, G., Powers, J. and Pattanayak, S.K., 2007. Valuing water quality improvements in the United States using meta-analysis: Is the glass half-full or half-empty for national policy analysis?. Resource and Energy Economics, 29(3), pp.206-228.
Vaughan, W.J., 1986. The RFF water quality ladder. Appendix B in Mitchell, Robert Cameron and Richard Carson, The Use of Contingent Valuation Data for Benefit/Cost Analyses in Water Pollution Control, Washington, DC: Resources for the Future.
Viscusi, W.K., Huber, J. and Bell, J., 2008. The economic value of water quality. Environmental and Resource Economics, 41(2), pp.169-187.

