

          Supporting Statement for Information Collection Request for
Survey to Improve Benefit-Cost Analysis of Water Quality Regulations: Instrument, Pre-test, and Implementation
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       

                                       
                               TABLE OF CONTENTS

List of Attachments	4
PART A OF THE SUPPORTING STATEMENT	5
1.	Identification of the Information Collection	5
1(a)	Title of the Information Collection	5
1(b)	Short Characterization (Abstract)	5
2.	Need For and Use of the Collection	6
2(a)	Need/Authority for the Collection	6
2(b)	Practical Utility/Users of the Data	7
3.	Non-duplication, Consultations, and Other Collection Criteria	8
3(a)	Non-duplication	8
3(b)	Public Notice Required Prior to ICR Submission to OMB	10
3(c)	Consultations	10
3(d)	Effects of Less Frequent Collection	12
3(e)	General Guidelines	12
3(f)	Confidentiality	12
3(g)	Sensitive Questions	13
4.	The Respondents and the Information Requested	13
4(a)	Respondents	13
4(b)	Information Requested	16
5.	The Information Collected  -  Agency Activities, Collection Methodology, and Information Management	20
5(a)	Agency Activities	20
5(b)	Collection Methodology and Information Management	21
5(c)	Small Entity Flexibility	21
5(d)	Collection Schedule	22
6.	Estimating Respondent Burden and Cost of Collection	24
6(a)	Estimating Respondent Burden	24
6(b)	Estimating Respondent Costs	24
6(c)	Estimating Agency Burden and Costs	24
6(d)	Respondent Universe and Total Burden Costs	25
6(e)	Bottom Line Burden Hours and Costs	25
6(f)	Reasons for Change in Burden	26
6(g)	Burden Statement	26
PART B OF THE SUPPORTING STATEMENT	28
1.	Survey Objectives, Key Variables, and Other Preliminaries	28
1(a)	Survey Objectives	28
1(b)	Key Variables	29
1(c)	Statistical Approach	31
1(d)	Feasibility	31
2.	Survey Design	32
2(a)	Target Population and Coverage	32
2(b)	Sampling Design	32
2(c)	Precision Requirements	36
2(d)	Questionnaire Design	43
3.	Pretest	45
4.	Collection Methods and Follow-up	46
4(a)	Collection Methods	46
4(b)	Survey Response and Follow-up	46
5.	Analyzing and Reporting Survey Results	47
5(a)	Data Preparation	47
5(b)	Analysis	47
5(c)	Reporting Results	52


	List of Attachments
      Attachment 1	 -  Screenshots of Draft Survey			
      Attachment 2	 -  Federal Register Notice

PART A OF THE SUPPORTING STATEMENT

1.	Identification of the Information Collection
1(a)	Title of the Information Collection 
A survey to improve economic analysis of surface water quality changes 

1(b)	Short Characterization (Abstract)

Researchers and analysts in EPA's Office of Research and Development (ORD), Office of Water (OW), and National Center for Environmental Economics (NCEE) are collaborating to improve EPA's ability to perform benefit-cost analysis on changes in surface water quality (lakes, rivers, and streams).  We are requesting approval to conduct a survey that will provide data critical to that effort.
Several non-market valuation methods can be used to estimate the economic benefits of improving environmental quality, but they often require more time and resources than are available to federal agency analysts in a regulatory context.  Benefit transfer can provide reasonably accurate estimates of economic benefits under certain conditions with fewer resources and far less time.  Federal agencies often rely on benefit transfer when analyzing the economic impacts of environmental regulation.  In conducting benefit-cost analyses of surface water quality regulations, however, it has become apparent that some important aspects of people's preferences about water quality are highly uncertain, which has made it necessary for analysts to make un-tested assumptions to fill the data gaps.  This information collection is necessary to provide insight on those relationships and improve the EPA's and other federal agencies' ability to perform benefit transfer in regulatory analysis.
Analysts in the Office of Policy, the Office of Water, and the Office of Research and Development are developing an integrated assessment model of water quality and economics designed to be flexible and modular, such that it could eventually be capable of estimating benefits for a wide range of surface water changes.  The data collected with this survey will inform that effort.  Analysts elsewhere in the EPA and other federal agencies, such as USDA, may also be able to use the results of this study to improve benefit transfer in other surface water quality applications as well.
The survey will be administered electronically via the Internet.  An Internet-based survey mode provides several advantages in efficiency and accuracy over other collection modes.  It is also necessary to meet several of our research objectives described below. EPA is requesting comment on two alternative approaches to sample recruitment: a probability-based Internet panel, and a mail invitation to the Internet survey. Some sections of this draft Supporting Statement cannot be completed until a recruitment mode has been chosen.  Where possible, this draft Supporting Statement provides details on both approaches.  EPA will consider public comments, expert opinion, and peer reviewed literature before choosing a recruitment mode.  The final Supporting Statement will reflect the chosen mode, present all details of that approach, and be submitted for public comment.  
The total national burden estimate for all components of the survey is 2,040 hours. The burden estimate is based on 120 responses to pretest surveys and 6,000 responses to main surveys. Assuming 20 minutes are needed to complete the survey, the total respondent cost comes to $61,200 for the pre-test and main survey combined, using an average wage rate of $30.00 (U.S. Department of Labor, https://www.bls.gov/news.release/empsit.t19.htm).

2.	Need for and Use of the Collection
2(a)	Need/Authority for the Collection
When time and other resources are not sufficient for an original study of non-market benefits, benefit transfer can provide reasonably accurate estimates under certain conditions.  Benefit transfer involves adapting the results of previously conducted studies to approximate the conditions of the current application.  This can be done using just a few studies if there are existing studies that closely match the current application, or it can be done in a meta-analytic framework in which many studies are combined to generate a conditional distribution of estimates.  In either case, transferring benefits from other studies may require analysts to parameterize certain relationships that were not included in the original studies.  EPA has identified several such relationships for which there are little or no data to inform the parameterization and so analysts must make assumptions to complete the benefit transfer.
One of the key relationships this information collection will address is that between a household's willingness to pay (WTP) and the distance to the improved resource.  Known in the literature as "distance decay," this relationship is critical in estimating benefits for surface water improvements.  The most recent EPA analyses of water regulations have assumed that households are willing-to-pay for water quality improvements within 100 miles of their home, with no distance decay within that range, but they are not willing to pay for improvements outside of that range (US EPA, 2015).  It is more likely that WTP declines gradually as distance increases and could be non-zero at greater distances than we have assumed in the past.  This study will collect data that will allow EPA to better analyze the nature of the distance-decay relationship.  
Another assumption EPA analysts have been forced to make involves the relationship between the amount of water that is improved and the magnitude of the improvements.  This relationship is called the marginal rate of substitution between quantity and quality, and little or no data exist regarding the nature of that tradeoff.  Benefit-cost analysis on recent surface water regulation implicitly assumes a one-to-one tradeoff between quantity and quality (US EPA, 2015).  While some assumption is required to perform the benefit transfer, this particular value is primarily a simplifying assumption.  The information collected under this study will allow EPA analysts to better estimate this relationship and improve the accuracy of benefit transfers.  
The third key relationship this information collection will address is between peoples' values for the human use aspects of water quality, such as recreation, and their values for water quality's impacts on ecosystem function.  In past analyses it was implicitly assumed that the two dimensions of water quality are captured by the WQI so a single metric could reflect all water quality improvements (US EPA, 2011; US EPA 2015).  However, different aspects of water quality affect the human use and ecosystem function dimensions of value differently.  This study will collect data on how people value those two dimensions of water quality and how those dimensions interact with each other.  
Finally, data collected under this information collection request will provide methodological contributions to help improve stated preference methods for estimating benefits. Recent literature suggests that allowing respondents to preview the potential attribute improvements ahead of time can minimize framing and ordering effects sometimes found in stated preference studies (Bateman et al., 2004; Johnston et al., 2017). The study outlined in this ICR implements a split-sample design that includes a unique feature to allow respondents to change their responses after completing the choice questions.  Both rounds of responses will be recorded and will allow EPA to test whether "pre-notification" of attribute improvement ranges yields more robust responses.  Since all respondents will have the same information when given the chance to revise their answers, data from the split sample also can be pooled for a more robust analysis.  Additionally, some studies have found evidence of attribute non-attendance, where respondents may not fully consider the full suite of attributes describing the environmental commodity of interest (Hensher and Greene 2010).  The current study will test and, if necessary, control for such tendencies. 
The project is being undertaken pursuant to section 104 of the Clean Water Act dealing with research. Section 104 authorizes and directs the EPA Administrator to conduct research into a number of subject areas related to water quality, water pollution, and water pollution prevention and abatement. This section also authorizes the EPA Administrator to conduct research into methods of analyzing the costs and benefits of programs carried out under the Clean Water Act. The data collected under this request will help EPA and other practitioners better inform assumptions regarding the above relationships when conducting benefit-cost analyses. 

2(b)	Practical Utility/Users of the Data
 
This research effort could be used to inform future benefit-cost analysis of surface water improvements.  There are specific parameters in the current benefit transfer approach that determine distance decay and the marginal rate of substitution between quantity and quality.  The current approach, while representing the current state of the science, makes either explicit or implicit assumptions regarding these parameters to fill data gaps.  The data collected with this survey and the subsequent analysis will provide empirical insights for those relationships.  In all previous EPA analyses of water regulations that used the meta-analytic approach, a single metric has been used to represent water quality.  The results of this study will test the validity of this assumption by treating recreational and ecological benefits separately in the analysis.  

3.	Non-duplication, Consultations, and Other Collection Criteria
3(a)	Non-duplication

There are many studies in the environmental economics literature that quantify the benefits or willingness to pay (WTP) associated with various types of surface water quality and aquatic ecosystem changes. Newbold et al. (2018) identified 51 stated preference valuation studies of water quality. The majority (92%) of the studies identified examined water quality changes at a local or state level.   Transferring values beyond the study areas is only appropriate when the characteristics of the resource, the baseline and post-policy conditions, and the affected populations are sufficiently similar in the policy case (U.S. EPA 2016). Aggregating values from numerous independent studies may overestimate benefits unless one properly accounts for income constraints and the availability of substitutes.  
Although meta-analyses have synthesized the results from the stated preference literature on surface water quality (Van Houtven et al. 2007, Newbold et al. 2018, Johnston et al. 2017) and have been used for benefit-transfer of EPA policies (U.S. EPA 2015, 2009), such approaches require simplifying (and often untested) assumptions. As discussed in Part A, section (2) of this ICR, such assumptions include the appropriate rate for distance decay of household WTP, the marginal rate of substitution between water quantity and quality, and the use of a single metric to proxy overall water quality.  The data collected with the current survey and the subsequent analysis will provide empirical evidence pertaining to those relationships.  
There are two existing nationwide stated preference studies of water quality improvements but features of their study designs limit their usefulness in EPA benefit transfer and meta-analysis.  The first study is by Carson and Mitchell (1993), and asked respondents to value nationwide improvements in water quality using a stated preference survey.  Water quality was expressed using the "water quality ladder", which communicates water quality in terms of whether the water is considered suitable for boating, fishing, and swimming. They elicited respondents' WTP for three different changes, leading to all lakes and rivers in the U.S. reaching a minimum rating of boatable, fishable, or swimmable.  
Although a common metric that is salient to respondents and can be easily communicated, there are two key limitations when measuring water quality using the water quality (WQ) ladder.  The first is that the WQ ladder focuses on human uses by construction, and more specifically water-based recreation. In some cases, aspects of water quality that contribute to an improved recreational experience are not correlated with aspects of water quality that contribute to the health of an aquatic ecosystem.  For example, fecal coliform contamination can make water contact recreation dangerous but has no impact on the aquatic ecosystem.  Eliciting values for improvements expressed solely in terms of the WQ ladder may confound sources of value that the public holds for aquatic ecosystem health. A key feature of the survey described in this ICR is that the collected data will allow EPA to test whether the public holds values for aquatic ecosystem health that are independent of those for recreational uses. 
The second limitation of using the WQ ladder metric is that its discrete nature makes it impossible to estimate any value households may hold for intra-category improvements in water quality.  This is important for regional and national water quality policies that, although broad in spatial scope, may on average yield relatively small incremental improvements in water quality in terms of magnitude.  The data from this information collection will allow EPA to assess what, if any, value the public places on improvements that do not cross a human-use threshold.  
Carson and Mitchell also asked respondents to value improvements that would lead waters to uniformly meet a minimum rating across the country.  They did not specify baseline water quality or improvement levels at a finer spatial scale. This simplification, while making the valuation exercise more manageable for respondents, ignores sub-national variation and prevents inferences about how WTP may decay with distance.  Both are important factors when estimating WTP for national and regional improvements in surface water quality and will be explicitly addressed by the survey instrument described in this ICR, thus better informing assumptions underlying EPA's benefit transfer methodologies. 
More recently, Viscusi et al. (2008) conducted a national stated preference study of lakes and rivers in the U.S. They asked respondents a series of iterative choice questions utilizing a measure of the percent of rivers and lakes within the U.S. that are rated "Good" versus "Not Good."  Although this "percent of waters" measure is continuous with respect to the quantity of waters, it is restricted to the discrete "good" versus "not good" rating describing water quality and is thus subject to similar limitations as studies utilizing the water quality ladder metric -- the discrete nature of the metric makes it impossible to estimate household values for intra-category improvements.  Finally, Viscusi et al. asked respondents to value water quality improvements to water bodies within 100 miles of their place of residence, which may capture the preponderance of use values, but, like the Carson and Mitchell study, does not allow inferences about how WTP decays with distance.  As described above, the data collected from the survey instrument described in this ICR will provide information to better inform EPA's underlying assumptions regarding distance decay, public values for relatively small (intra-category) water quality improvements, and whether the public values water quality changes independent of recreational uses. 
In summary, although two nationwide stated preference studies have been conducted, and both were novel at the time, the study proposed in this ICR addresses several gaps that will provide a deeper and more comprehensive understanding of public values for improvements in surface water quality at a regional and national level. In fact, Carson and Mitchell (1993) state that "... more research is needed to determine how benefits change with small changes in water quality and, in particular, the spatial location of those changes."  These are precisely the gaps that the study described in this ICR intends to fill.  The proposed study includes continuous measures of water quality, but at the same time delineates thresholds communicating ordinal categories of quality, similar to those presented by Carson and Mitchell (1993), Viscusi et al. (2008), and numerous other stated preference studies of water quality (e.g., Anderson and Edwards, 1986; Desvousges et al., 1987; Johnston et al., 1999). In doing so, the proposed study will be able to estimate any public values for intra-category improvements in water quality. The proposed study design will also elicit respondents' WTP for water quality improvements in different regions, including the region where a respondent lives and regions at various distances from the respondent's residence.  This feature of the study design will allow for explicit modelling of how WTP decays with distance from the waterbodies being improved and allow EPA to determine the extent of the market for those improvements.  This will provide a more detailed and accurate examination of public values for water quality improvements, and better inform assumptions underlying EPA's current benefit transfer methodologies. Lastly, the proposed study includes two separate water quality measures, one focusing on human and recreational uses and the other on ecosystem health. Accounting for both dimensions of water quality independently allows for a more complete understanding of public values for changes in surface water quality.  

3(b)	Public Notice Required Prior to ICR Submission to OMB

      To be completed after EPA receives public comments 


3(c)	Consultations

Consultations with Scholars: On November 2[nd] and 3[rd] of 2017, EPA's Office of Research and Development hosted a workshop in Narragansett, RI of STAR grantees conducting stated preference studies of surface water quality benefits.  At this meeting, the EPA presented the plans for our own study and received feedback from several academic researchers with expertise in this area.  The feedback was predominantly supportive of our research goals, and workshop participants provided a number of useful suggestions regarding our survey design and estimation approach.

A second meeting with the same STAR grantees was held in Ithaca, NY on April 2[nd] and 3[rd] of 2019 to report progress.  EPA received additional feedback on this effort and discussed survey design features that would complement the STAR grantees projects and provide opportunities for cross-validation of our results.  

Consultations with Respondents: As part of the planning and design process for this collection, EPA conducted a series of seven focus groups and 24 one-on-one cognitive interviews. Four of the focus groups took place in Arlington, VA; two in Phoenix, AZ, and one in Chicago, IL.  Focus group locations were chosen to collect a diverse set of perceptions and experiences with freshwater lakes, rivers, and streams. Cognitive interviews were conducted in Alexandria and Arlington, Virginia.  Early focus group sessions were used to explore how respondents think of water quality and quantity, how to communicate measures of those attributes, to identify a list of widely recognized waterbodies and the main features that make those waters widely recognized.  Later sessions and cognitive interviews were employed to test draft versions of the survey. These consultations with potential respondents were critical in identifying sections of the questionnaire that were unnecessary or lacked clarity, and in producing a survey instrument meaningful and comprehensible to most respondents. The later focus group sessions and the cognitive interviews were also helpful in estimating the amount of time respondents would need to complete the survey instrument. While completion times varied, most participants completed the survey in 20 minutes or less. The focus group sessions and cognitive interviews were conducted under OMB Control # 2090-0028.

Consultations with Experts: The survey instrument benefited from consultation with three leading scholars specializing in stated preference surveys for estimating benefits associated with water quality improvements, and environmental quality more broadly: Dr. Catherine Kling, Professor and Faculty Director, Cornell Atkinson Center for Sustainability; Dr. Daniel Phaneuf, Professor, Department of Agricultural and Applied Economics, University of Wisconsin-Madison; and Dr. Robert Johnston, Director, George Perkins Marsh Institute, Professor, Department of Economics, Clark University.

EPA also held a meeting in Corvallis, OR on March 26[th] and 27[th] of 2019 with limnologists, ecologists and other economists in the Office of Research and Development.  The purpose of the meeting was to refine the metric for aquatic ecological health that would be used on the survey and discuss data needs for the study design and data analysis phases of the project.

Survey Design Team: Dr. Christopher Moore at the U.S. Environmental Protection Agency serves as the project manager for this study. Dr. Moore is assisted by Dr. Matthew Massey, Dr. David Smith, Dr. Bryan Parthum, Dr. Wes Austin, all of whom are with the U.S. EPA's National Center for Environmental Economics and have extensive experience in stated preference methods and other non-market valuation approaches.  Also assisting on the project are Dr. Paul Ringold, Dr. Steve Paulsen, Dr. Matthew Heberling, Dr. Nathanial Merrill, and Dr. Steve Newbold.  Drs. Ringold and Paulsen are aquatic ecologists in the Office of Research and Development's Western Ecology Division.  Drs. Heberling and Merrill are economists with extensive non-market valuation experience in the Office of Research and Development's Center for Environmental Measurement and Modeling.  Dr. Newbold is an associate professor of economics in the School of Business at the University of Wyoming with a background in ecology and extensive experience in non-market valuation.
	
3(d)	Effects of Less Frequent Collection

The survey is a one-time activity. Therefore, this section does not apply.

3(e)	General Guidelines

The survey will not violate any of the general guidelines described in 5 CFR 1320.5 or in EPA's ICR Handbook.

3(f)	Confidentiality
All responses to the survey will be kept confidential to the extent provided by law. EPA's detailed survey questionnaire will not ask respondents for personal identifying information, such as names or phone numbers. Instead, each survey response will receive a unique identification number. The addresses of respondents will not be provided by the contractor. The latitude and longitude of respondents' place of residence will be "perturbed" using a perturbation algorithm that randomly shifts geocoded latitude and longitude to maintain confidentiality. The extent of the perturbation ranges from 100 to 2,000 feet to the north or south, and east or west, depending on the population density in a respondent's Census Block Group. The only geographic identifier then provided to EPA is the Census Block FIPS code corresponding to the perturbed coordinates.  
Prior to taking the survey, respondents will be informed that their responses will be kept confidential to the extent provided by law. The name and address of the respondent will not appear in the resulting database, preserving the respondents' identity. The survey data will be made public only after it has been thoroughly vetted to ensure that all other potentially identifying information has been removed.

3(g)	Sensitive Questions

The survey questionnaire will not include any sensitive questions pertaining to private or personal information, such as sexual behavior or religious beliefs.

4.	The Respondents and the Information Requested
4(a)	Respondents 
Eligible respondents for this stated preference survey will be U.S. civilian, non-institutionalized individuals, age 18 years and older who reside in the 48 contiguous United States and the District of Columbia. Respondents will be selected randomly from an existing Internet panel or recruited directly via mail. Both recruitment approaches employ an addressed-based sampling (ABS) methodology from the Delivery Sequence File of the United States Postal Service  -  a database with full coverage of all delivery points in the United States. ABS is considered a promising alternative to random digit dialing (Dillman et al. 2009) because of the number of cell phone-only households in the United States. As of December 2016, more than half, 50.8%, of US households are cell-phone only. The sample will thus represent all households regardless of their phone status. 
More detail on the sample frame and sampling methodology will be presented in the final Supporting Statement and submitted for public comment.
4(b)	Information Requested
(i)	Data items, including recordkeeping requirements
EPA developed the survey based on findings from a series of ten focus groups and 24 cognitive interviews conducted as a part of the survey instrument development process (OMB control # 2090-0028). Focus groups provided valuable feedback that allowed EPA to iteratively edit and refine the questionnaire and to eliminate or improve imprecise, confusing, or unnecessary questions. In addition, later focus groups and cognitive interviews provided useful information on the approximate amount of time needed to complete the survey instrument. This information informed our burden estimates. Cognitive interviews were also used to assess and improve the appearance of the survey on computers, tablets, and smart phones.  Focus groups and cognitive interviews were conducted following standard approaches in the literature, as outlined by Desvousges et al. (1984), Desvousges and Smith (1988), Opaluch et al. (1993), Schkade and Payne (1994), and Johnston et al. (1995). 
EPA has determined that all questions in the survey are necessary to achieve the goal of this information collection (see Part A, section 2 for the list of objectives).  The current draft of the survey is included in Attachment 1 and is described in more detail in Part B of this ICR. EPA will conduct extensive nonresponse analysis including benchmarking, which uses demographic information to compare the respondents and non-respondents to the general U.S. population and evaluate potential non-response bias. Other data available on respondents and nonrespondents that is available from the sample frame and believed to be correlated with our key variables of interest in the analysis will be used to estimate response propensity and compared among subgroups. 
The following is an outline of the major sections of the survey.  Screen numbers refer to the numbered screenshots in Attachment 1.
Survey purpose and description. (Screen 1) This screen describes the purpose of the survey and what the respondent can expect as they complete it.  This screen also informs the respondent that the EPA is conducting the survey to collect data that may be used to guide future policy decisions.  Knowledge that the data collected could be used to influence an agency's actions is required to satisfy the conditions of consequentiality and incentive compatibility (Carson and Groves 2007).  When met, these conditions imply that respondents answer stated preference questions truthfully.
Describing outdoor water quality. (Screen 2) The next screen introduces the respondent to two different categories of water quality  -  water recreation and aquatic biodiversity.  Text is also provided to distinguish between surface water quality and drinking water quality, the latter of which is not addressed in this survey. 
Questions about visits to waterbodies. (Screens 3-7) Next a series of questions regarding visits and recreational use of waterbodies are presented. These questions are meant to prime respondents for thinking about surface water quality and provide data that can be used to compare to other national surveys, which serve as a benchmark to assess the representativeness of our sample.  Question 1 asks if the respondent has taken a trip to a lake, river, or stream in the last 12 months and will be used to identify users of lakes, rivers, and streams in the data analysis.  Question 2 asks if they have gone fishing in freshwater. These questions were borrowed directly from the National Survey of Fishing, Hunting, and Wildlife-Associated Recreation which is a national survey conducted by the U.S. Fish and Wildlife Service and the U.S. Census Bureau.  Question 3 asks how many single-day trips to a river, lake, or stream the respondent has taken in the past 12 months.  Question 4 asks for the main purpose of the last trip taken.  Question 5 asks how many miles the respondent traveled for the last single-day trip they took to a lake, river, or stream.  Questions 3, 4, and 5 also appear on the National Survey on Recreation and the Environment, which is a national telephone survey.  Including these questions on the survey described in this ICR will provide a comparison to other national surveys administered via different modes and using different sampling strategies in order to assess the representativeness of our sample regarding characteristics that are key to our main study objectives.  
Features describing hypothetical policy options.  (Screens 8-21) The next screen introduces the four features or attributes describing the hypothetical policy options presented to respondents. This is followed by a series of screens describing each feature in more detail. The four features are: 
    How much water would be affected. The following screen explains that surface area, measured in square miles, is the metric that the survey will use to convey the quantity of water affected by the policy.  Respondents are shown two illustrative figures to explain how surface area of lakes, rivers, and streams is calculated.  
    What improvements in water quality you could expect. The next five screens describe the two water quality metrics. The first three screens focus on the Recreation Score, which is intended to capture human use aspects of value. The following two screens describe the aquatic biodiversity score, which is intended to capture the value a respondent may hold that is independent from their direct use of waterbodies, such as existence value, option value, or bequest value.  For both water quality scores, the survey first presents some basic background information, followed by additional details about the various factors that impact each metric.  Then text is provided that describes how to interpret the scores, which is supplemented with a graphic showing the scale and linking it to a graphical interpretation. These visual aids help reinforce interpretation of the scores and allows respondents to more easily make the cognitive link to specific water quality attributes. 
    Where policies would be implemented. The Policy Regions shows a map of the major watershed of the contiguous US and explains that the areas in which the policy or policies that could be implemented would include one or more of these regions.  The following two screens use a map of the major watersheds to show people the water surface area, current Recreation Score and current Biodiversity Score in each watershed.  
    Household cost of implementing each policy. The last policy attribute presented to respondents is the cost to their household if the policy were put in place.  On the first of two screens, respondents are given several examples of how pollution would be reduced.  The following screen describes the payment vehicle as an increase in annual federal, state, and local taxes to fund those policies.  Focus group testing and comments from experts led to the conclusion that framing the payment vehicle as a somewhat abstract bundle of taxes minimizes respondent concerns regarding the feasibility of the payment vehicle and effectiveness of government spending (i.e., whether the tax revenue would really be spent to improve water quality as described). The increase in annual taxes is stated to be in effect for five years.  Five years was chosen because a longer time period was judged by focus group participants to be too constraining to future generations and fraught with uncertainties regarding the future political climate and government priorities. At the same time, a one-time payment was judged to be not enough to sustain the specified environmental improvements over time.  Based on focus group testing and cognitive interviews, a five-year tax increase seemed to be a broadly acceptable middle ground between these two conflicting concerns.  The final screen in this section shows a simple line graph to help respondents visualize how improvements would occur gradually over time and eventually level off to sustainable, permanent levels.  
      
Range of improvements and costs. (Screen 21a) The next slide will only be shown to half of the respondents in the split-sample experimental design.   This slide presents a table that shows the range of attribute and cost improvements a respondent will see in the subsequent choice questions.  Referred to as a visible choice set, recent literature has suggested that presenting respondents with the full range of attribute changes helps provide a better frame of reference and can reduce anchoring and ordering effects often found in stated preference studies (Bateman et al., 2004; Johnston et al., 2017). The environmental attribute levels chosen in the experimental design for this study are meant to ensure coverage of the range that one may reasonably expect from actual regional or national water quality policies. 
Choice questions. (Screens 22-37) Directly before the choice questions is a screen that reinforces consequentiality and reminds respondents to consider their budget constraint when answering the questions.  Emphasizing these points has been shown to effectively reduce hypothetical bias in stated preference surveys (Carson and Groves 2007).  Respondents are then presented with six choice question scenarios.  Each choice scenario uses a map to indicate a policy region where the changes would take place.  Below respondents are shown a comparison of the status quo option for which the Recreation and Aquatic Biodiversity Scores remain unchanged and no additional costs are incurred by the household, and a "policy" option, where one or both Scores improve, and some positive cost is incurred by the household. According to conventional economic theory, respondents will choose the regulatory option that they prefer based on their preferences. The status quo option is always available, something that is necessary for appropriate welfare estimation (Adamowicz et al. 1998). Following standard approaches (Opaluch et al. 1993, 1999; Johnston et al. 2002a, 2002b, 2003), each question is separated by a reminder to consider each choice scenario independently, disregard previous questions, and to not add up costs or benefits across scenarios.  This reminder is included to avoid biases associated with sequence aggregation effects (Mitchell and Carson 1989). A unique feature of the experimental design is that after completing all six choice question scenarios, respondents are given the opportunity to review, and if desired, change their initial choices. Both the initial and revised choices are recorded.  This feature, along with the split-sample experimental treatment, will allow for a formal assessment of how the "visible choice set" feature (i.e., presenting respondents with the full range of environmental and cost attribute levels one can expect upfront) may impact responses to stated preference surveys. 
Debriefing questions.  (Screens 38-46) The last section of the survey presents a series of debriefing questions. The first set of questions entail Likert scale responses (1 through 5) of how much a respondent agrees/disagrees with a statement, or to what degree various factors affected their vote in the choice scenarios.  The responses to these questions will be used to econometrically assess various potential biases, alternative econometric models, etc.  More specifically, responses to these questions will be used to assess: hypothetical bias, consequentiality, warm-glow, protest responses, stated attribute non-attendance, existence and bequest motivations for non-use values, and motivations for use values (e.g., option value). This is followed by a series questions regarding employment, housing tenure, and language spoken.  Responses to this final set of questions are directly comparable to questions in the US Census Bureau's American Community Survey, and thus provide additional variables for assessing the representativeness of the sample. 
	

(ii)	Respondent activities
EPA expects individuals to engage in the following activities during their participation in the survey: 
 Go online to answer a web survey
 Review the brief background information provided in the beginning of the survey document 
 Complete the survey questionnaire.
A typical subject is expected to take 20 minutes to complete the survey. These estimates are derived from focus groups in which respondents were asked to complete the current draft of the survey.
5.	The Information Collected  -  Agency Activities, Collection Methodology, and Information Management
5(a)	Agency Activities

The survey is being developed, conducted, and analyzed by EPA's National Center for Environmental Economics with contract support provided by ________________________
Agency activities associated with the survey consist of the following: 
 Developing the survey questionnaire and sampling design
 Programming the web survey
 Pretesting the on-line survey instrument
 Emailing or mailing pre-notification and survey invitation 
 Sending a reminder email or postcard 
 Placing a reminder phone call or mailing a final reminder letter 
 Data cleaning
 Analyzing survey results 
 Conducting the non-response bias analysis to test and (if needed) correct for such bias 
EPA will primarily use the survey results to improve the Agency's existing benefit transfer methodologies by better informing assumptions regarding distance decay of household WTP, the marginal rate of substitution between water quality and quantity, and whether households value water quality improvements independent of improvements in designated recreational activities. (See Part A, sections (2) and (3) for details). EPA will also examine how responses change as the respondents advance through the valuation questions and how certain design features can reduce associated biases (framing and ordering effects).  EPA is testing new methods for identifying when respondents do not fully attend to all attributes and econometric approaches for improving estimates when that occurs.

5(b)	Collection Methodology and Information Management
EPA plans to implement the proposed survey using a voluntary, web-based survey. A web-based electronic survey reduces burden because respondents would only see questions relevant to them, based on their responses to prior questions (i.e. skip patterns). An internet survey will also use checks and prompts to minimize missing and/or incorrectly entered information. Most importantly, an electronic survey allows for the complex experimental design that is needed to inform assumptions on distance decay of WTP and the marginal rate of substitution between water quality improvements and the quantity of waters impacted. More specifically, an electronic survey allows EPA to automate the various survey versions respondents receive, thus facilitating a design that entails sufficient variation in the location of a policy region, the quantity of waters impacted, baseline water quality, and the change stated improvement in quality. Additionally, the split-sample design used to examine whether pre-notification minimizes framing and ordering effects requires that some respondents be prevented from changing their answers to previous policy questions. This cannot be done with a paper survey but can easily be implemented within an electronic survey.  
More detail on the collection methodology and information management specific to recruitment mode will be provided in the final Supporting Statement and submitted for public comment.

5(c)	Small Entity Flexibility
This survey will be administered to individuals, not businesses. Thus, no small entities will be affected by this information collection.

5(d)	Collection Schedule
The schedule for implementation of the survey will depend on the recruitment mode chosen and will be presented in the final Supporting Statement. 


6.	Estimating Respondent Burden and Cost of Collection
6(a)	Estimating Respondent Burden

Subjects who participate in the survey during the pre-test and main surveys will expend time on several activities. EPA will use similar materials in both the pre-test and main stages of the survey.  It is reasonable to assume the average burden per respondent activity will therefore be the same for subjects participating during either pre-test or main survey stages. 
Based on focus groups and cognitive interviews, EPA estimates that on average each respondent taking the survey will spend 20 minutes (0.33 hours) reviewing the introductory materials and completing the survey questionnaire. Assuming that 120 respondents complete the survey, the national burden estimate for respondents to the pre-test survey is 40 hours. During the main survey stage, the national burden estimate for these survey respondents is 2,000 hours assuming that 6,000 respondents complete the survey. 

These burden estimates reflect a one-time expenditure in a single year.

6(b)	Estimating Respondent Costs
(i)	Estimating Labor Costs
   	According to the Bureau of Labor Statistics, the average hourly wage for private sector workers in the United States in February 2021 is $30.00 (U.S. Department of Labor, https://www.bls.gov/news.release/empsit.t19.htm). Assuming an average per-respondent burden of 0.33 hours (20 minutes) for individuals completing the survey and an average hourly wage of $30.00, the average cost per respondent is $10.00. The total time cost for the 6,120 individuals expected to complete the survey in both the pre-test and main implementation would be $61,200.  
EPA does not anticipate any capital or operation and maintenance costs for respondents.

6(c)	Estimating Agency Burden and Costs

Agency costs arise from staff costs and contractor costs. EPA staff have expended 1,200 hours developing and testing the survey instrument to date and is expected to spend an additional 2,000 hours finalizing the survey instrument, analyzing data, writing reports, reviewing intermediate products, and managing the project more generally. The total EPA staff costs are shown in Table A1 below and total $295,794. 

Table A1: Agency Burden Hours and Costs
                                    GS Level
                                      Hours
                                   Hourly Rate
                           Hourly Rate with Benefits
                                     Total
                                       12
                                       400
                                     $ 42.02
                                    $ 67.23
                                  $26,892.00 
                                       13
                                       800
                                     $ 49.96
                                    $ 79.94
                                  $63,952.00 
                                       14
                                       1000
                                     $ 59.04
                                    $ 94.46
                                  $94,460.00 
                                       15
                                       1000
                                     $ 69.06
                                    $ 110.49
                                 $110,490.00 
                                       --
                                       3200
                                       --
                                       --
                                 $295,794.00 

 ______________ -  -  -  will be providing contractor support for this project with funding of $______ from EPA contract No. _____________. In total ____________ staff and its consultants are expected to spend ______ hours adding the development, pre-testing, and review of the study. 
Agency and contractor burden is _______ hours, with a total cost of $_______. 

6(d)	Respondent Universe and Total Burden Costs
EPA expects the total cost for survey respondents to be $61,200 based on a total burden estimate of 2,020 hours (across both pre-test and main stages) at an hourly wage of $30.00.

6(e)	Bottom Line Burden Hours and Costs

The following tables present EPA's estimate of the total burden and costs of this information collection for the respondents and for the Agency. The bottom line burden for these two together is $_______.


Table A2: Total Estimated Bottom Line Burden and Cost Summary for Respondents
                               Affected Individuals
                                  Burden (hours)
                                  Cost (2020$)
   Pre-test Survey Respondents
                                        40
                                     $1,200
   Main Survey Respondents
                                       2000
                                    $60,000
   Total for All Survey Respondents
                                       2040
                                    $61,200




Table A3: Total Estimated Burden and Cost Summary for Agency
                              Affected Individuals
                                 Burden (hours)
                                  Cost (2012$)
 EPA Staff
                                      3200
                                    $295,794
 EPA's Contractors for the Survey
                                        
                                        
 Total Agency Burden and Cost
                                        
                                        


6(f)	Reasons for Change in Burden

This is a new collection. The survey is a one-time data collection activity.

6(g)	Burden Statement

EPA estimates that the public reporting and record keeping burden associated with the survey will average 0.33 hours per respondent (i.e., a total of 2,040 hours of burden divided among 120 pre-test respondents and 6000 main survey respondents).  Burden means the total time, effort, or financial resources expended by persons to generate, maintain, retain, or disclose or provide information to or for a Federal agency. This includes the time needed to review instructions; develop, acquire, install, and utilize technology and systems for the purposes of collecting, validating, and verifying information, processing and maintaining information, and disclosing and providing information; adjust the existing ways to comply with any previously applicable instructions and requirements; train personnel to be able to respond to a collection of information; search data sources; complete and review the collection of information; and transmit or otherwise disclose the information. An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control number. The OMB control numbers for EPA's regulations are listed in 40 CFR part 9 and 48 CFR chapter 15. 
To comment on the Agency's need for this information, the accuracy of the provided burden estimates, and any suggested methods for minimizing respondent burden, including the use of automated collection techniques, EPA has established a public docket for this ICR under Docket ID No. _____________, which is available for online viewing at www.regulations.gov, or in person viewing at the Office of Water Docket in the EPA Docket Center (EPA/DC), EPA West, Room 3334, 1301 Constitution Ave., NW, Washington, DC. The EPA/DC Public Reading Room is open from 8:30 a.m. to 4:30 p.m., Monday through Friday, excluding legal holidays. The telephone number for the Reading Room is 202-566-1744, and the telephone number for the Office of the Administrator Docket is 202-566-1752. 
Use www.regulations.gov to obtain a copy of the draft collection of information, submit or view public comments, access the index listing of the contents of the docket, and to access those documents in the public docket that are available electronically. Once in the system, select "search," then key in the docket ID number, ____________________

PART B OF THE SUPPORTING STATEMENT

1.	Survey Objectives, Key Variables, and Other Preliminaries		
1(a)	Survey Objectives
	
The overall goal of the survey is to collect and analyze data that will improve the accuracy of EPA benefits-cost analysis of surface water quality changes, particularly those that rely on benefit transfer and meta-analytic approaches.  To that end EPA has developed several specific research questions that the survey has been designed to answer.
   Study objective #1: Estimate a relationship between households' values for water quality improvements and their distance from the improved resource (i.e., distance decay).
   Study objective #2: Estimate a relationship between households' values for water quality and the amount of surface waters that are being improved (i.e. the marginal rate of substitution between water quality and the scope of the improvements).
   Study objective #3: Estimate a valuation function that can separately represent values for human uses and values for ecosystem functions.  
   Study objective #4: Test for framing and ordering effects in a multiple-question design and the effectiveness of previewing the potential attribute improvements (i.e., pre-notification) in reducing or eliminating those effects.  
   Study objective #5: Identify which attributes respondents fully consider when making choices (i.e. attribute attendance) and control for those tendencies in estimation using survey design features and econometric methods.
   Study objective #6: Estimate a valuation function capable of estimating benefits for surface water regulations that can be used as a preliminary scoping value for benefits and/or as a point of comparison for estimates found using other methods.

To perform benefit transfer, analysts must make either explicit or implicit assumptions about most of the issues addressed in our research objectives.  Currently there is little or no data to inform those assumptions.  This survey will collect data that will provide a stronger empirical foundation for future analyses regarding these issues. 


1(b)	Key Variables

The key questions in the survey show respondents a region of the contiguous U.S. and ask whether they would vote for a policy that would result in the specified improvement in water quality in that region in exchange for a temporary increase in their federal, state, and local taxes.  The choice scenarios are presented as dichotomous choice questions.  For each choice question a respondent can either choose the status quo option where the Recreation and Aquatic Biodiversity Scores remain unchanged and no additional costs are incurred by the household, or they can choose the "policy" option, where one or both of the Scores improve, and some cost is incurred by the household. According to conventional economic theory, if the respondents view the survey as consequential -- that is, respondents believe that policymakers might use the survey results to help decide whether or not to implement the policy described in the survey -- then the respondents will choose the option that they prefer based on their preferences and budget constraints (Carson and Groves 2007, Poe and Vossler 2011). The status quo option is always available, a feature that is necessary for appropriate welfare estimation (Adamowicz et al. 1998). 
Two of the key attributes defining the options in the choice scenario are the Water Recreation and Aquatic Biodiversity scores. Inclusion of both scores addresses study objective #4. 
The Water Recreation Score is based on the RFF Water Quality Ladder (Vaughn 1986) and corresponds to the suitability of surface waters for different types of water recreation.  Respondents are told at what values on the 100-point index experts consider waterbodies suitable for boating, fishing, and swimming.  Because the policy regions shown in each question contain many lakes, rivers, and streams, respondents are given the average score for the entire region, weighted by surface area.  The baseline Water Recreation Score for each policy region is calculated from baseline data used for the 2015 Steam Electric Power Generating Effluent Guidelines analysis (EPA, 2015).   
The Aquatic Biodiversity Score is based on the ratio of the number of observed species in a waterbody to the expected number of species if the water body was in the same condition as the least disturbed water bodies in that region.  This metric is referred to as the "O/E ratio," and is calculated by dividing the number of benthic macroinvertebrate and/or plankton species observed at a site by the number expected.  The latter is based on region-specific model predictions, calibrated by ecologists based on the most undisturbed waters in the corresponding eco-region.  
By construction, the Aquatic Biodiversity Score can range from 0 to 100.  The baseline Aquatic Biodiversity Scores for each policy region were calculated using a weighted average of the benthic macroinvertebrate O/E ratios from the 2013-2014 National Rivers and Streams Assessment and the plankton O/E ratios from the 2007 National Lakes Assessment. These data were chosen because they represent the most recent assessments for which O/E ratios were available. 
For each policy region defined in the experimental design, a separate average O/E ratio is calculated for rivers/streams and then for lakes. These calculations were performed by EPA's Office of Research and Development based on the original sampling weights in the National Aquatic Resource Surveys and the respective water surface area corresponding to each sampling site.   The O/E ratios were then combined as a weighted average based on the corresponding surface area for each waterbody type according to the National Hydrography Dataset.  
The rest of the key variables presented in the choice questions are defined by the policy regions.  One of those variables is total surface area of lakes, rivers, and streams in a policy region.  Total surface area is given in square miles and was taken from the National Hydrography Dataset Plus (NHDPlus) . The amount of water included in each policy region will be used to estimate a marginal rate of substitution between quantity and quality by estimating interaction effects between surface area and improvements in the two water quality scores (thus fulfilling study objective #3).  Another key variable defined by the policy region is the location of river and stream reaches and lakes within the region, and thus the distance of a respondent to each of the improved waters.  These data will be used to analyze the effect of distance from the improved resource on WTP (and thus address study objective #2).  Together, all variables discussed thus far will be used to estimate a valuation function capable of estimating benefits for surface water improvements at a national or regional level, and thus be used to address study objective #1. 
Additionally, the split-sample experimental design for which half the respondents are given a preview to the full range of attribute levels in the choice scenarios (i.e., the visible choice set or pre-notification feature), in combination with the above variables, choice responses, and the survey instrument's unique ability to allow respondents to go back and change their original choices, will provide insight towards study objective #5 by allowing for econometric tests of whether such "pre-notification" reduces ordering and framing effects. Finally, responses to debriefing questions on screen 34, in conjunction with the choice scenario responses and variables discussed above, will facilitate estimation of alternative models that address attribute non-attendance (fulfilling study objective #6). 

1(c)	Statistical Approach
A statistical survey approach in which a randomly drawn sample of households are asked to complete a survey is appropriate for estimating values associated with water quality improvements. A census approach is impractical because of the extraordinary cost of contacting all households. An alternative approach, where individuals self-select into the sample, would be subject to severe selection bias and so would not support generalizable inferences. Therefore, the statistical survey is the most appropriate approach to inform the research objectives discussed in part B, section 1(a). 
EPA has retained Abt Associates Inc. (55 Wheeler Street, Cambridge, MA 02138) under EPA contract EP-C-13-039 and EP-W-17-009 to assist in the questionnaire and sampling design. 

1(d)	Feasibility
Following standard practice in the stated preference literature (Johnston et al. 1995; Adamowicz et al. 1998; Louviere et al. 2000; Bennett and Blamey 2001; Batemen et al. 2002; Johnston et al. 2017), EPA conducted a series of ten focus groups and 24 cognitive interviews (OMB control # 2090-0028). Based on findings from these activities, EPA made various modifications to the survey instrument intended to reduce the potential for respondent bias, reduce respondent cognitive burden, and increase respondent comprehension of the survey materials. In addition, EPA solicited peer review of the survey instrument by three specialists in academia, as well as input from other experts (see Section 3c in Part A). Recommendations and comments received as a part of that process have been incorporated into the design of the survey instrument and the revised survey was subsequently tested in 24 cognitive interviews.
Because of the steps taken during the survey development process, EPA anticipates that most respondents will have little difficulty interpreting or responding to the survey questions. Furthermore, since the survey will be administered using an established national-level panel, it will be accessible to all respondents and representative of the national population. EPA therefore believes that respondents will not face any obstacles in completing the survey, and that the survey will produce useful results. EPA has dedicated sufficient staff time and resources to the design and implementation of this survey, including funding for contractor assistance under EPA contract ______________________. 

2.	Survey Design
2(a)	Target Population and Coverage
The U.S. adult population comprises the target population for the survey, as represented by a general-population sample that is probability-based. 
		
2(b)	Sampling Design
(i)	Sampling Frame
Once a recruitment mode is chosen, details on the sampling frame will be provided in the final Supporting Statement and submitted for public comment.

 (ii)	Sample Sizes
A representative sample of the U.S. adult population 18 years of age and older will be surveyed for this study. Both pretest and main study samples will be randomly split into two groups of equal size for assignment to each of the two versions of the survey. The target sample size for the pretest is 60 completed surveys per subsample group, for a total of 120 completed surveys. The target sample size for the main survey is 3,000 completed surveys for each subsample group, for a total of 6,000 completed surveys. It should be noted that these sample sizes are large enough to accommodate precision requirements of these surveys. Details on survey estimates are discussed in section 2(c)(i).  

(iii)	Stratification Variables
The sample design does not include geographic stratification.

(iv)	Sampling Method
Once a recruitment mode is chosen, details on the sampling method will be provided in the final Supporting Statement and submitted for public comment.

 (v)	Multi-Stage Sampling
Multi-stage sampling will not be necessary for this survey.

(vi)       Experimental Design
	The experimental design of our study follows established practices and is largely determined by several guiding principles that are described below.  

Model Identification: the ability to obtain unbiased parameter estimates from the data for every parameter in the model.  To ensure model identification, effects in design must not confound one another, i.e. be collinear.  To identify effects of interest, the experimental design must sufficiently vary the relevant attribute levels within and across choice questions and, in the case of higher-order effects, include sufficient numbers of attribute-level combinations.  

Orthogonality: requires strictly independent variation of levels across attributes, in which each attribute level appears an equal number of times in combination with all other attribute levels.  Because the study takes advantage of actual water quality conditions in the policy regions, regional fixed effects will be collinear with the observed baseline values.  The design addresses this by ensuring similar baseline values appear in different regions, thereby breaking the perfect correlation between baseline and region. 

Balance: requires each level within an attribute to appear an equal number of times.  The total number of alternatives (the number of questions multiplied by the number of alternatives in each set) should be evenly divisible by the number of levels for each attribute. For example, if the design includes three-level and four-level attributes, the total number of alternatives must be divisible by both 3 and 4 to ensure balance (i.e., 12, 24, 36, etc.).  

Confounding: When a main effect is perfectly correlated with an interaction effect between two other variables it is not possible to identify the main effect.  This is primarily a problem with fractional factorial designs.  A full factorial design will allow us to identify all effects.  

Implausible combinations: Constraints that exclude implausible combinations or dominated alternatives introduce some degree of correlation and level imbalance in the experimental design.  Focus group and cognitive interview testing has shown that respondents generally accept all attribute level combinations in the full factorial design.  

Several designs that adhere to these principles were developed and examined in a series of numerical simulation experiments for power analysis described in section 2(c)(iii) below. 

The levels of and correlations among some option attributes will be constrained by their associations with the pre-determined hypothetical policy regions, as described in Attachment 6. Other attribute levels will be assigned orthogonally with the aid of a full factorial design, and questions will be randomly divided among distinct survey versions (Louviere et al. 2000). Based on standard choice experiment design procedures (Louviere et al. 2000), the number of questions and survey versions were determined by, among other factors: a) the number of attributes in the final experimental design and complexity of questions, b) focus group and cognitive interview testing revealing the number of choice experiment questions that respondents are willing/able to answer in a single survey session, and c) the number of attributes that may be varied within each question while keeping the cognitive burden low enough to allow respondents to confidently identify their preferred alternative among the presented options.

Choice sets (Bennett and Blamey 2001), including variable level selection, were designed by EPA based on the goal of illustrating realistic policy scenarios that "span the range over which we expect respondents to have preferences, and/or are practically achievable" (Bateman et al. 2002, p. 259), following guidance in the literature. This includes guidance regarding the statistical implications of choice set design (Hanemann and Kanninen 2001) and the role of focus groups in developing appropriate choice sets (Bennett and Blamey 2001).  Since each policy option includes an improvement in at least one water quality dimension and will involve some cost to the household relative to the status quo scenario, there will be no dominated choices in the design.  
Based on these guiding principles, the following experimental design framework is proposed by EPA. A description of the statistical design is presented in Attachment 6. The experimental design will allow for estimation of main effects based on a choice experiment framework. Each treatment (survey question) includes a status quo option (no improvements from baseline and no cost) and a policy option.  Each treatment refers to a policy region for which the respondent is given information on the amount of surface water in the region (square miles) and baseline recreational and ecological water quality scores.  The policy option describes the changes in the recreation and ecological scores (both the post-policy levels and the changes from the status quo levels) that would occur and the cost that would accrue to the respondent's household if the policy were implemented.  The water surface area and baseline water quality scores in the status quo option are based on actual data, so the variation in these factors are determined by the natural variation among the policy regions.

2(c)	Precision Requirements
(i)	Precision Targets
Precision of survey estimates is a direct function of sample size.  Since our primary questions of interest are dichotomous choice, we are interested in how precisely we can estimate the probability of a binary outcome.  The needed sample size n to secure a specific level of precision    can be calculated using the following formula, in which N represents the size of the population p is the proportion and z is the percentile of the standard normal distribution.
                                       
In order to obtain the most conservative estimate of the minimum sample size, p is chosen to be 50% and the size of the population is assumed to be infinite. As such, Figure B1 shows the expected margin of error, with 95% confidence, as a function of sample size.  Accordingly, survey estimates produced using each subsample of 3,000 will have error margins that will be no larger than +-2% at 95% level of confidence .  


Figure B1 Margin of Error as a Function of Sample Size

(ii) 	Power analysis
Power analysis for sample size calculations is an important aspect of scientific survey sampling, since without such calculations the employed sample sizes may be inadequate for detecting significant differences.  Equally important, without power analysis a study can be based on costly sample sizes far larger than what precision requirements dictate.  As such, power analyses guard a study from failure in detecting differences that should be declared significant, as well as cost overruns due to unnecessarily large samples.
In broad terms, most statistical inferences involve tests of hypotheses about population parameters based on observed data from samples.  Analogous to any other decision-making process, challenging the null hypothesis in favor of the alternative will be subject to two types of errors.  As illustrated in the following table, these two types of error -- the frequencies of which are conventionally denoted by  and  -- correspond to rejecting a true null hypothesis and failing to reject a false null hypothesis, respectively.
Table B1 Depiction of Type I and Type II Errors
                                       
                                     Truth
                                       
                                      H0
                                      H1
                        Inference Based on Survey Data
                                      H0
                          Null true and not rejected
                               Type II Error ()
                                       
                                      H1
                                Type I Error ()
                            Null false and rejected

Considering the above inferential infrastructure, for any empirically informed decision-making process it is desirable to keep both types of error as small as possible.  For instance, for most hypothesis testing applications it is desirable for the chance of Type I error () to be at or below 5% with 95% confidence.  In contrast, sample size requirements related to Type II error () are often stated in terms of power of the test, which is 1-.  This quantity represents the probability of rejecting the null hypothesis when it is false.
Because there are many forms of statistical tests, there are numerous scenarios for power analysis. However, a common approach for sample size calculation is to require a minimum power level for detection of a desired Effect Size () between the two population parameters. For instance, it is often the case that the probability of correctly rejecting the null hypothesis (power of the test) is set at 0.8 for detecting a significant difference between two population parameters. Naturally, the smaller the minimum detectable difference () is expected to be, the larger sizes will be required for both samples.  
For illustration purposes, the following figure shows the required sample size per group so that a difference of δ= 5% between two population proportions can be detected as significant with a power of 0.8, using a one-sided Fisher's Exact Conditional Test for two proportions.  Moreover, the required sample sizes per group are calculated for various possible values for the first population proportion, ranging from 5% to 90%.  As expected, the required sample size increases as the two proportions under consideration get close to 50%.
                                       
Figure B1 Required Sample Size Per Group to Detect  = 5% with 0.8 Power

Accordingly, with an anticipated total sample size of 6,000 for this survey, there will be sufficient power for detection of significant differences. This would be the case even for survey estimates that exhibit high levels of sampling variability, such as estimates of proportions that are close to 50%. Moreover, individual survey estimates are expected to carry small error margin, as discussed at length in Section 2c. In summary, the chosen sample size for this survey is large enough to provide parameter estimates with modestly narrow confidence intervals, as well as detect significant differences when various subgroups are compared to each other.

(iii) Supplemental Power Analysis for Choice of Experimental Design
We conducted a supplemental power analysis based on the specific form of the estimating equation we intend to use (described in section 5.2 below) and a representative target of estimation, average household willingness to pay for a benchmark water quality improvement. This power analysis required simulating many possible datasets given our sampling design, survey format, and prior assumed values for all parameters to be estimated (de Bekker-Grob et al. 2015). 
For purposes of simulating the test data, we used 49 geographic zones based on the 48 contiguous states plus the District of Columbia. Each zone was normalized to have the same total surface area, 63,682.2 sq mi, such that the sum of the zones equals the total land area of the continental U.S. The centroid of each zone was set equal to the centroid of the corresponding state, and the area of surface water was set equal to the areal density of surface water of the corresponding state times the surface area of the zone. In this way we constructed 49 geographic zones of the same size with a surface water quantity distribution and spatial pattern similar to that of the continental U.S. 
We cross tabulated the zones with the policy regions used in our survey design by assigning zone z to policy region j if the centroid of zone z was contained in policy region j. We also used the zones as the areas from which we randomly drew simulated survey respondents. The population of each zone was set equal to the corresponding state population. The simulated set of respondents was constructed by drawing randomly from the zones using probability weights proportional to the population in each zone, so each household had the same probability of being selected into the sample. Respondent incomes were drawn from a lognormal distribution calibrated to match average incomes of the bottom 99% and the top 1% of incomes in each respondent's home state.  
We simulated the choice data using the parameter values shown in the first column of numbers in Table B2 below. These parameter values were selected to produce an average WTP for a 1 percentage point improvement in both scores for all surface water bodies in all zones close to $20 per year, which we view as a plausible central estimate considering previous water quality valuation studies and focus group findings. We set the scale parameter such that the pseudo-R2 value for the estimated model is between 0.25 and 0.35.
We examined four experimental designs that vary the number of attribute levels, policy regions, and number of questions on each survey such that each design is balanced and orthogonal.  We also tested different distributions of attribute levels within the maximum range and found that even distributions produced more efficient designs than cases where some levels were clustered together.  Table B1 below shows the simulated standard errors for each estimated parameter of the model.  
	
      Table B2. Power Analysis Results for Candidate Survey Designs. 
                                       
                                       
                           Estimated Standard Errors
                                   Parameter
                                  True value
                                   3 levels
                                  6 questions
                                   6 regions
                                   4 levels 
                                  4 questions
                                   8 regions
                                   5 levels 
                                  5 questions
                                   5 regions
                                   6 levels 
                                  6 questions
                                   6 regions
                                      α
                                      7.0
                                    3.1252
                                    4.7737
                                    3.9421
                                    4.1291
                                      β
                                      0.5
                                    0.0198
                                    0.0278
                                    0.0245
                                    0.0267
                                      θ
                                      0.5
                                    0.0466
                                     0.066
                                    0.0632
                                    0.0654
                                      η
                                      0.5
                                    0.1822
                                    0.2934
                                    0.2346
                                    0.1986
                                      φ
                                      0.1
                                    0.0066
                                    0.0096
                                    0.0102
                                     0.01
                                      γ
                                    0.0069
                                    0.0006
                                    0.0007
                                    0.0008
                                    0.0008
                                      δ
                                      0.5
                                    0.0141
                                    0.0218
                                    0.0205
                                    0.0221
                                      τ
                                      1.0
                                    2.1292
                                    2.9658
                                    2.3274
                                    2.4534
                                      σ
                                     0.05
                                    0.0008
                                     0.001
                                     0.001
                                    0.0011
                            Estimated Values of WTP
                                     WTP+1
                                     ~$20
                                     20.77
                                     20.79
                                     20.68
                                     20.77
                                    SE(WTP)
                                       
                                    0.3924
                                     0.604
                                    0.4434
                                    0.4515

To estimate the standard errors of the maximum likelihood parameter estimates for each candidate study design, we simulated the survey response data 100 times and for each iteration we calculated the gradient of the log likelihood function at the true parameter values, g0. We then took the average of the outer product of the gradients over the 100 simulated datasets to estimate their expected value, Eg0'g_0, which corresponds to the Fisher information matrix by the "information matrix equality" (e.g., Greene 2012 p 557). The inverse of this matrix is an estimate of the variance-covariance matrix of the maximum likelihood parameter estimates, V, and the square roots of the diagonal elements of this matrix are estimates of the standard errors of the maximum likelihood parameter estimates. 
Our estimates of the standard errors for each parameter under each survey design configuration are shown in the corresponding columns of Table B1. We used the delta method to estimate the standard error of the average sample willingness to pay for a one percentage point increase in both water quality scores in all geographic zones, i.e.,  seWTP+1=h'Vh, where h is a vector of first derivatives of WTP+1 with respect to each parameter. These estimates are shown in the final row of Table B1.
The results in Table B1 provide several indications about the relative efficiency of the candidate survey design configurations that we examined.  All designs result in relatively accurate and efficient estimates of WTP for a 1% improvement in all attributes with the 3-level, 6-question design producing the smallest standard error.  Likewise, the standard errors on all parameter estimates are the lowest using that design.  Intuitively, the design with 6 questions and 3 levels for each attribute is the most efficient because it collects the most information from each respondent (6 valuation questions) while maximizing the distance between attribute levels which minimizes the likelihood of the model predicting the wrong response given the idiosyncratic error associated with each estimated effect.  

(iv)	Non-Sampling Errors
A variety of non-sampling errors may be encountered in stated preference surveys. Coverage error occurs when some eligible units have zero probability of being selected. For the current survey, the generalizable population is all U.S. households.  Address-Based Sampling (ABS) methodology is used to recruit members into probability-based Internet panels and direct recruitment through mail.  The sampling frame from which respondents will be recruited is the universe of all U.S. residential addresses, secured from the latest Delivery Sequence File (DSF) of the U.S. Postal Service. This database provides a complete listing of all residential points of delivery in the United States, regardless of telephone or internet status, thus the potential for coverage error is minimal.
Non-response bias is another type of non-sampling error that can potentially occur in stated preference surveys. Non-response bias can occur when households do not participate in a survey or do not answer all relevant questions on the survey instrument (item non-response). EPA has designed the survey instrument to maximize the response rate. EPA will also follow Dillman et al.'s (2014) web-based survey approach (see Subsection 4(b) for details). If necessary, EPA will use appropriate weighting or other statistical adjustments to help mitigate potential bias due to non-response. To determine whether there is any evidence of significant non-response bias in the completed sample, EPA will conduct a non-response bias analysis using available data. This will enable EPA to identify potential differences between respondents to the web survey and those who received a URL but did not complete it.

(vi)	Nonresponse Bias Study
When a sizable percentage of sampled individuals choose not to respond to a survey, it will be advisable to conduct analyses that can assess the potential magnitude of nonresponse bias on survey estimates (Fahimi 2004).  We intend to conduct a non-response analysis to quantify and ultimately address this issue in our estimation of WTP. We understand that any single method might be insufficient to address all possible forms of non-response bias (Groves 2006 and Groves et al. 2006). Therefore, we intend to follow the relevant recommendations in OMB's guidelines (Graham 2006), Montaquila and Olson (2012) and Halbesleben and Whitman (2013). We will conduct two types of non-response bias analysis: benchmarking and response propensity analysis.
	Benchmarking involves comparing data collected from our sample with other data collection efforts.  We will compare the geodemographic distributions of our starting sample and the resulting respondents against those of the target population using the latest information from the Current Population Survey (CPS).  While socio-demographic representativeness is informative, non-response bias analysis can be improved by benchmarking our data collection to other variables that may be more closely related to WTP for water quality improvements.  Our survey instrument includes questions on respondents' water-based recreation activities and frequency that were taken directly from the U.S. Forest Service's National Survey on Recreation and the Environment and the U.S. Fish and Wildlife Service's National Survey of Fishing, Hunting, & Wildlife-Associated Recreation. This will allow for direct comparison of our survey sample to nationally representative samples from these broader benchmark surveys. This will provide further insight into any potential nonresponse biases, since user frequency and recreational activities are likely correlated with WTP. 
EPA will also pursue a novel approach for addressing non-response bias, where one first estimates the propensity to respond to a survey as a function of variables that are available for both respondents and non-respondents, as described by Johnston and Abdulrahman (2017) and Cameron and DeShazo (2013).  In a second stage one then interacts the predicted propensity to respond with each of the environmental attributes. The coefficients corresponding to these interaction terms will capture any systematic differences in preferences across households that have higher or lower propensity to respond to the survey.  Response propensity analysis requires data on households that were contacted to complete a survey but did not participate.  If we choose a mail-push-to-web recruitment mode, data on households that were selected into the sample but did not participate will be very limited.  Probability based internet panels maintain extensive data on the panel and will provide the means for a robust response propensity analysis.  Most internet panel proprietors also collect some data from households that chose not to join the panel which will allow EPA to examine response propensity at the two critical points of non-response. This approach provides insight towards any unobserved self-selection bias and, if present, a means to correct for it. 

2(d)	Questionnaire Design
An overview of each section of the survey and details of the information requested by the survey are discussed in Section 4(b)(i) of Part A of this supporting statement. The full texts of the two draft questionnaires are provided in Attachment 1. The difference between survey versions 1 and 2 is the inclusion of screen 21(a).  Several categories of questions are included in the survey. The reasons for including each of these categories are discussed below:
Questions about visits to waterbodies (screens 3 through 7). A series of questions regarding visits and recreational use of waterbodies are presented. These questions are meant to prime respondents for thinking about surface water quality and to provide data that can be used to compare to other national surveys, which serve as a benchmark to assess the representativeness of our sample.  Question 1 asks if the respondent has taken a trip to a lake, river, or stream in the last 12 months and will be used to identify users of lakes, rivers, and streams in the data analysis.  Question 2 asks if they have gone fishing in freshwater. These questions were borrowed directly from the National Survey of Fishing, Hunting, and Wildlife-Associated Recreation which is a national survey conducted by the U.S. Fish and Wildlife Service and the U.S. Census Bureau.  Question 3 asks how many single-day trips to a river, lake, or stream the respondent has taken in the past 12 months.  Question 4 asks what was the main purpose of the last trip taken.  Question 5 asks how many miles the respondent traveled for the last single-day trip they took to a lake, river, or stream.  Questions 3, 4, and 5 also appear on the National Survey on Recreation and the Environment, which is a national telephone survey.  Including these questions on the survey described in this ICR will provide a comparison to other national surveys administered via different modes and using different sampling strategies in order to assess the representativeness of our sample regarding characteristics that are key to our main study objectives.  

Choice questions (screens 23 through 31). The questions in this section are the key component of the survey.  Respondents' choices among alternatives with specific water quality improvements in a given region and household cost increases are the main data that allow estimation of willingness to pay, distance decay, and the marginal rate of substitution between quantity and quality. Respondents are presented with a series of six choice question scenarios.  Responses to these choice questions are of primary interest to this study. Each choice scenario entails a status quo option where the Recreation and Aquatic Biodiversity Scores remain unchanged and no additional costs are incurred by the household, and a "policy" option, where one or both of the Scores improve and some positive cost is incurred by the household. Following standard consumer theory, respondents will choose the regulatory options that they prefer based on their preferences. The status quo option is always available, something that is necessary for appropriate welfare estimation (Adamowicz et al. 1998). Following standard approaches (Opaluch et al. 1993, 1999; Johnston et al. 2002a, 2002b, 2003), each choice question is separated by a reminder to consider each scenario independently, disregard previous questions, and to not add up costs or benefits across scenarios.  This is included to avoid biases associated with sequence aggregation effects (Mitchell and Carson 1989). A unique feature of the experimental design is that after completing all six choice question scenarios, respondents are given the opportunity to review, and if desired, change their initial choices. Both the initial and second set of choices are recorded.  This feature, along with the split-sample experimental treatment, will allow for a formal assessment of how the "visible choice set" feature (i.e., presenting respondents with the full range of environmental and cost attribute levels one can expect upfront) may impact responses to stated preference surveys. 

Opportunity to revise answers (screens 32 through 37) When respondents complete the last choice question they are asked if they would like to review and possibly change their answers.  Some evidence from valuation surveys has shown a tendency for respondents' framing of the questions to change as they answer more of them.  These questions are included to test for framing effects by comparing the first set of responses to the revised responses.  The split sample design will allow us to test if the pre-notification screens (only included on Version 1 of the survey) have an impact on whether people change their answers  -  an indication of framing effects.
Debriefing questions (screens 38 through 40) These questions ask respondents about their motivations as to why they chose certain choice options over others, and whether they accepted the hypothetical scenario when making their choices. These questions will help to identify respondents who incorrectly interpreted the choice questions or did not believe the choice scenarios to be credible. In other words, the responses to these questions will be used to identify potentially invalid responses, such as: protest response (e.g., protest any government program), scenario rejection, omitted variable considerations (e.g., economic and employment impacts), and symbolic (warm glow) responses (e.g., want a better environment in general). The responses to some of the questions will also be used to identify motivations behind respondents' choices, including: altruism, option value, bequest, etc.  Five-point Likert scale response formats are used for all debriefing questions which allows a number of analysis approaches.  If the responses are used to screen some observations from the sample (e.g. because they indicate scenario rejection) sensitivity analysis can be performed by using different values on the scale as the critical value that causes an observation to be flagged and dropped from the sample. Responses can also be treated as categorical variables and treated parametrically in the data analysis. Responses to questions on screen 34 will be used to examine stated attribute non-attendance, and thus examine the robustness of the results to econometric models that account for such behavior (see Part B, section 5(b) for further details). Finally, questions on screens 35 through 41 inquire about employment, housing tenure, and language spoken.  Responses to this final set of questions are directly comparable to questions in the US Census Bureau's American Community Survey, and thus provide additional variables for assessing the representativeness of the sample. 
    
Demographic Questions (screens 41 through 45) These questions collect data that will be used to conduct our non-response analysis.  

Open ended comments (screen 46) This question gives respondents an opportunity to provide comments on the topics covered in the survey.  Responses to this question are rarely used in quantitative analysis but are a common feature in stated preference surveys.   

3.	Pretest
EPA intends to implement this study in two stages: a pretest and a main survey. First, EPA will administer the pretest to obtain a total of 120 completed surveys split into 2 samples of 60 each. The purpose of the pretest is to confirm the survey length and to quality control the testing of the survey instrument and survey data. 

4.	Collection Methods and Follow-up
4(a)	Collection Methods
EPA is considering two methods for administering the surveys: i) probability-based internet panel (e.g., KnowledgePanel/Ipsos or similar), and ii) mail with a push-to-web (online) administration. Each method has its relative advantages over the other.  Both methods begin with the same sample frame by recruiting from the Delivery Sequence File of the United States Postal Service  -  a database with full coverage of all delivery points in the United States and respondents under both approaches have a known probability of being selected into the sample.  Internet-based panels offer several advantages over the mail push-to-web approach, however.  Because the sample frame only contains valid participants, there is less wasted effort and materials compared with a mail recruitment approach in which there will be some invalid addresses.  Internet panels maintain data on all of their participants that would allow for a shorter questionnaire and facilitate a more robust non-response analysis.  Panel participants are compensated for their participation whereas a mail recruitment approach would have to offer a separate cash payment to incentivize participation.  The sample drawn from the panel for our survey can be tailored to match sociodemographic benchmarks to improve its representativeness.

Regardless of the collection method, there are several design features necessary to achieve our research objectives that can only be included in an electronic survey. For example, our strategy to identify framing and ordering effects and the effectiveness of prenotification in addressing those potentially bias-inducing behaviors requires us to record respondents' initial answers to the policy questions, while allowing them to review and provide different answers to the same questions if they choose.  It is critical to this strategy that respondents are not able to go back and change their initial answers, something that cannot be prevented using a physical copy of a mail survey. Another example is our novel approach to identifying attribute non-attendance and controlling for it econometrically. Latent class modeling is a way to place respondents into groups indicating which attributes they fully considered when answering choice experiment questions. We plan to use the amount of time spent on each screen describing the separate attributes as independent variables in estimating the class probabilities for each respondent.  This is only feasible with an electronic survey mode.  

Survey development costs will be comparable between the two collection methods as the survey will be administered online (electronically) in both scenarios. However, collection and administration costs do vary between the two methods. The address-based probability sample will require collection and administration to take place through the firm's platform using their respondent panel. Costs are determined by number of surveys completed, geographic range, and survey length.   


       Table B3. Cost Estimates for Survey Collection and Administration
Item
                             Address-based Sample
                                   Mail with
                                  Push-to-Web
Design and Consultation
                                    $10,000
                                      $0
Materials
                                      $0
                                      $7
Incentives
                                      $0
                                      $2
Postage
                                      $0
                                     $1.50
Fee per Completed Survey
                                      $35
                                      $0
Total Cost of Survey Packet
                                      $35
                                    $10.50
Number of Surveys Solicited
                                    10,000
                                    30,000
Number of Completed Surveys
                                     6,000
                                     6,000
Total
                                   $220,000
                                   $315,000
Cost per Completed Survey
                                      $35
                                    $52.50

4(b)	Survey Response and Follow-up
Both recruitment approaches use multiple contacts to increase the overall response rate.  The mail push-to-web approach is limited to physical mailings delivered via post whereas the probability-based internet panel would utilize email and phone contacts to remind panelists to complete the survey.  

The sample chosen from the internet panel would receive an advance email one week before the survey is available.  Potential respondents will receive another notification email when the survey is available including a unique link to their survey.  No login information or password is required with the unique link to minimize attrition.  Those who have not completed the survey will receive a reminder email two weeks after the survey has gone live.  Three days after that, respondents who have not completed the survey will receive a phone call reminder.  Overall cooperation rate from participants in the panel is expected to be about 60%.  This rate does not include people that opted out of the panel during the initial recruitment.

The mail-push-to-web approach would follow the modified Dillman method for mixed mode recruitment (Dillman et al. 2009).  An initial invitation letter containing a $2 cash incentive and a unique URL would be sent to every address selected into the sample.  This would be followed by a reminder postcard one week later and a final reminder letter two weeks after the initial mailing.  If response rates are low, an additional completion incentive could be offered that is conditional on respondents completing the survey.  The completion incentive could take the form of a second cash payment or entry into a lottery for a larger prize. Response rates with initial incentives and conditional completion incentives are around 30% based on conversations with consultants who have recently used this recruitment method.  Mail-to-web surveys that offer no incentives have lower response rates in the 10% to 15% range.



5.	Analyzing and Reporting Survey Results
5(a)	Data Preparation

Since the survey will be administered on the internet, survey responses will be automatically entered into an electronic database as surveys are completed by each respondent. After all responses have been recorded and double-checked for accuracy, the database contents will be converted into a format suitable for statistical analysis and delivered to the EPA. 

5(b)	Analysis
Once the survey data have been checked for errors, cleaned, and assembled into a data file, they will be analyzed using statistical analysis techniques. The following section discusses the models that the EPA intends to use to analyze the survey responses.

 Analysis of Stated Preference Data
The basic strategy for analyzing stated preference data is grounded in the standard random utility model of Hanemann (1984) and McConnell (1990). The random utility model assumes that respondents anticipate their utility under each hypothetical choice option presented to them in the survey (typically including a "no new policy," or status quo, option), and then choose the option that would provide the highest utility in each choice scenario. This model is applied extensively within stated preference research, and allows well-defined welfare measures (e.g., willingness to pay) to be derived from choice experiment models (Bennett and Blamey 2001, Louviere et al. 2000). In the standard random utility model applied to choice experiments, hypothetical choice scenario options are described in terms of attributes that focus groups reveal as relevant to respondents' utility, or well-being (Johnston et al. 1995; Adamowicz et al. 1998; Opaluch et al. 1993). One of these attributes is a monetary cost to the respondent's household. 

Applying this standard model to choices among hypothetical policies that would improve water quality throughout the specified region in the continental United States (following the format described in Section B.2(d) of this supporting statement), a standard utility function Ui∙ includes the respondent's household income and the quantity and quality of the environmental resources that may be affected by the hypothetical policies described in the survey. Following standard random utility theory, utility is assumed known to the respondent, but stochastic from the perspective of the researcher, such that:

 (1)			Ui∙=UXi,D,Y-Fi=vXi, D,Y-Fi+εi,

where Xi is a vector of variables describing attributes of option i; D is a vector characterizing demographic and other attributes of the respondent; Y is the disposable household income of the respondent; Fi is the mandatory additional cost faced by the household under option i; v∙ is a function representing the empirically estimable component of utility; and ε - i is the unobservable component of utility, which is modeled as a stochastic error term.
A model of such a preference function is estimated by econometric methods designed for limited dependent variables, because researchers only observe each respondent's choice between or among two or more options rather than observing values of Ui∙ for each option directly (Maddala 1983; Hanemann 1984). Standard random utility models are based on the probability that a respondent's utility from program i, Ui∙, exceeds the utility from alternative programs j, Uj∙, for all potential programs j!=i considered by the respondent. In this case, the respondent's choice set of potential programs also includes maintaining the status quo. 
When faced with K distinct options, the respondent will choose the option with the highest expected utility. Drawing from (1), the respondent will choose program i if:

 (2)			vXi,D,Y-Fi+εi>=vXj,D,Y-Fj+εj  ∀  j!=i .	

If the ε's are assumed independently and identically drawn from a type I extreme value (Gumbel) distribution, the model may be estimated as a conditional logit model, as detailed by Maddala (1983) and Greene (2012). This model results in an empirical estimate of the systematic component of utility v∙, based on observed choices among different options. Based on this estimated function, welfare measures (e.g., willingness-to-pay) can be calculated following the well-known methods developed by Hanemann (1984) and summarized by Freeman (2003). Following standard choice experiment methods (Adamowicz et al. 1998; Bennett and Blamey 2001), each respondent will be presented with questions including two options (i.e., Option A [status quo], Option B) and asked to choose their most preferred option. Following clear guidance from the literature, a "no further action" or status quo option is always included in the choice set, to ensure that WTP measures are well-defined (Louviere et al. 2000).
Up to six choice questions will be included in each survey to increase the amount of information obtained from each respondent. Presenting each respondent with multiple choice scenarios is standard practice in choice experiment and dichotomous choice contingent valuation surveys (Poe et al. 1997; Layton 2000). but requires  allowing for potential correlations among responses by a single respondent. That is, while responses across different respondents are independent, the set of responses provided by any individual respondent may be correlated (Poe et al. 1997, Layton 2000, Train 1998). A common approach to accommodate such potential within-respondent correlations is to account for preference heterogeneity using random parameters, which leads to a mixed logit modeling framework (Poe et al. 1997, McFadden and Train 2000, Layton 2000, Greene 2003). Such models can be estimated using maximum likelihood methods, and the performance of alternative specifications can be assessed using standard statistical measures of model fit, as described by Train (1998), Greene (2002), and others.

 Econometric Specification
One objective of our study is to use the data collected from the survey to estimate a household willingness to pay function suitable for a wide range of regional or national policies that might affect water quality in lakes, rivers, and streams across the U.S. We specify the WTP function to accommodate several features that characterize people's preferences for water quality improvements according to published literature and our consultations with the public and subject matter experts.  Those features include differential values for outdoor recreation activities and aquatic biodiversity support, distance decay, diminishing marginal willingness-to-pay, and imperfect substitution between water quality and quantity. We developed the survey design with these objectives in mind, so that the data we collect will allow us to identify and have sufficient power to estimate the parameters that represent these features with an acceptable level of precision.
Towards these ends, the most general model we intend to estimate will be based on the following indirect utility function:

(3)
Vi=Yiβ+αGiβ1/β


(4)
Gi=j=1JqjηQjθϕ+max0,1-γxij 


(5)
qj=δqjR+1-δqjB


where Yi is income and Gi is a function indicating overall water quality for respondent i; j indexes uniform grid cells that cover the continental U.S.; qj is  a weighted average of recreational and ecological water quality and Qj is the quantity of surface water in cell j; x - ij is the distance between respondent i and cell  - j; and qjR and qjB are the average recreation and biodiversity scores in cell j.
In this framework, there are at least 7 parameters to be estimated (plus additional parameters for household demographic attributes we might include in the model): α, β, θ, η, ϕ, γ, and δ. The role of each parameter is as follows: α will determine the general magnitude of WTP for improvements in overall water quality; β controls the rate of substitution between income and overall water quality and so will control the rate at which marginal WTP for water quality improvements declines; η and θ control the marginal rate of substitution between, and marginal rates of return to, water quantity and quality; ϕ and γ (both constrained to be positive) determine the shape of the distance decay function (if ϕ>0 then some portion of WTP is independent of distance, possibly due to nonuse value; and γ is the slope of the linear decay function); and δ (constrained to lie between 0 and 1) is the weight placed on the recreation score relative to the biodiversity score.
The indirect utility function set out above is highly nonlinear, and so the parameters of the model cannot be estimated using a standard main-effects-only estimating equation where the indirect utility of each option is a simple linear function of the unknown parameters. Non-linear specifications can be more computationally challenging to estimate in practice, therefore we will begin with a simpler model that can be viewed as a special case of the more general model outlined above. Imposing the restrictions β=η=θ=1 produces the following simplified functional form:

(6)
Vi=Yi+αj=1JQ - jδqjR+1-δqjW ϕ+max0,1-γxij 


This simplification leads to a standard multinomial logit specification where each non-status quo option includes as arguments the cost of the option and the quantity- and distance-weighted changes in the two water quality scores in the policy region, jδΔqjR+1-δΔqjWQjϕ+max0,1-γxij1ij, where 1ij is an indicator variable equal to 1 if cell j is in the policy region presented to respondent i. In this case the willingness to pay function would be:
(7)
WTPi=αΔGi=αj=1JQjδΔqjR+1-δΔqjWϕ+max0,1-γxij1ij


so, assuming errors are independent and Gumbel distributed, the probability that respondent i would choose option k is:

(8)
pik=expσαΔGik-ckm=1KexpσαΔGim-cm .


We note that by using WTP as the latent variable, we can directly estimate the scale parameter, σ, as the coefficient on the cost variable, similar to Cameron and James (1987).
A straightforward way to accommodate individual or household-level heterogeneity would be to replace α with αi=XiΩ, where  - Xi is a vector of individual or household attributes. It also may be possible to include a random component in αi in a mixed logit framework. 
We intend to estimate the simpler WTP function in (5) and the more general WTP function implied by equations (1)-(3) using standard multinomial logit, mixed logit, and latent class modeling approaches (Train 2009). We will begin with the simplest functional form, based on equation (5), in the simplest estimation framework, the standard multinomial logit model. We will then graduate to more general models that can accommodate unobserved preference heterogeneity, the mixed logit and latent class models, and will estimate more general WTP functions by relaxing one or more of the restrictions that separate (1)-(3) from (5). We also will consider other functional forms that may not fit neatly into the general framework set out above. For example, it may be possible to estimate exponential or step distance decay functions rather than the linear distance decay functional form indicated in equation (2). We also may examine estimating equations with imperfect substitutability between the recreation and biodiversity scores. If multiple functional forms fit the data nearly equally well, then we will use formal model selection or model averaging techniques to synthesize the results.
Another objective of the study is to examine whether respondents attend to all policy attributes when answering the discrete choice questions and, if not, which attributes are not attended to and what survey design features can contribute to full attendance.  Attribute non-attendance occurs when respondent only rely on a subset of the attributes in a choice task to make their decision. Ignoring this tendency and estimating a model that assumes full attendance can bias parameter estimates and lead to inaccurate WTP estimates (Hensher and Greene 2010).  Econometric models that address attribute nonattendance restrict the values of attribute coefficients, only for respondents that exhibit nonattendance to those attributes.  There are two general approaches for estimating these models: stated nonattendance and inferred nonattendance.  We will estimate both types of models and compare results.

Stated nonattendance models rely on responses to subsequent survey questions to place respondents in attendance classes.  Our survey includes debriefing questions probing how important the policy attributes were when answering the discrete choice questions.  Each respondent's contribution to the simulated likelihood function will reflect their answers to these debriefing questions by restricting the values of the parameters on the attributes that they indicate had little or no impact on their decision to zero (Hensher et al. 2012; Kragt 2013; Koetse 2017).  

Concerns about the accuracy of this self-reported information has led to use of latent class models to separate individuals into classes and motivates our inferred nonattendance model (Scarpa et al. 2009, 2012; Hensher and Greene 2010; Hensher et al. 2012; Kragt 2013; Glenk et al. 2015; Thiene et al. 2015). In this framework class membership is unknown to the analyst and is instead treated probabilistically. Estimation requires specifying unknown parameters or a function to describe the nonattendance class probabilities, which are the probabilities that individual i belongs to class c (Hensher et al. 2016). These probabilities can be specified by the logit formula and estimated as a function of the choice-invariant characteristics such as demographic variables and whether their survey included the pre-notification screens before the choice questions, for example.  The stated and inferred approaches to modeling attribute nonattendance can be applied to the general model in equations (1)-(3) and the simplified model in equation (5).  


5(c)	Reporting Results

The results of the survey will be made public via open-access publications in peer reviewed journals and working papers posted on EPA's website.  Provided information will include summary statistics for the survey data, extensive documentation for the statistical analysis including all data and code used for estimation, and a detailed description of the final results. Given the breadth of our research objectives the statistical approach and relevant data will vary among outlets.  The survey data will be released upon request only after it has been thoroughly vetted to ensure that all potentially identifying information has been removed.


REFERENCES 
Adamowicz, W., Boxall, P., Williams, M. and Louviere, J., 1998. Stated preference approaches for measuring passive use values: choice experiments and contingent valuation. American journal of agricultural economics, 80(1), pp.64-75.
Anderson, G.D. and Edwards, S.F., 1986. Protecting Rhode Island's coastal salt ponds: an economic assessment of downzoning. Coastal Management, 14(1-2), pp.67-91.
Bateman, I.J., Mace, G.M., Fezzi, C., Atkinson, G. and Turner, R.K., 2014. Economic analysis for ecosystem service assessments. In Valuing Ecosystem Services. Edward Elgar Publishing.
Bateman, I.J., Cole, M., Cooper, P., Georgiou, S., Hadley, D. and Poe, G.L., 2004. On visible choice sets and scope sensitivity. Journal of environmental economics and management, 47(1), pp.71-93.
Bateman, I.J., Carson, R.T., Day, B., Hanemann, M., Hanley, N., Hett, T., Jones-Lee, M., Loomes, G., Mourato, S., Pearce, D.W. and Sugden, R., 2002. Economic valuation with stated preference techniques: a manual. Economic valuation with stated preference techniques: a manual.
Bennett, J. and Blamey, R. eds., 2001. The choice modelling approach to environmental valuation. Edward Elgar Publishing.
Cameron, T.A. and DeShazo, J.R., 2013. Demand for health risk reductions. Journal of Environmental Economics and Management, 65(1), pp.87-109. 
Carson, R.T. and Groves, T., 2007. Incentive and informational properties of preference questions. Environmental and resource economics, 37(1), pp.181-210.
Carson, R.T. and Mitchell, R.C., 1995. Sequencing and nesting in contingent valuation surveys. Journal of environmental economics and Management, 28(2), pp.155-173.
de Bekker-Grob, E.W., Donkers, B., Jonker, M.F. and Stolk, E.A., 2015. Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient-Patient-Centered Outcomes Research, 8(5), pp.373-384. 
Desvouges, W.H. and Smith, K.V., 1986. The Conceptual Basis of Benefits Estimation in Measuring Water Quality Benefits. Ray Perryman, ed.
Dillman, D.A., Smyth, J.D. and Christian, L.M., 2014. Internet, phone, mail, and mixed-mode surveys: the tailored design method. John Wiley & Sons. 
Dillman, D.A., Phelps, G., Tortora, R., Swift, K., Kohrell, J., Berck, J. and Messer, B.L., 2009. Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet. Social science research, 38(1), pp.1-18.
Glenk, K., Martin-Ortega, J., Pulido-Velazquez, M. and Potts, J., 2015. Inferring attribute non-attendance from discrete choice experiments: implications for benefit transfer. Environmental and Resource Economics, 60(4), pp.497-520. 
Greene, W.H., 2012. Econometrics Analysis. Seven Edition. 
Greene, W.H. and Hensher, D.A., 2010. Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation, 37(3), pp.413-428.
Groves, R.M., 2006. Nonresponse rates and nonresponse bias in household surveys. Public opinion quarterly, 70(5), pp.646-675.
Halbesleben, J.R. and Whitman, M.V., 2013. Evaluating survey quality in health services research: a decision framework for assessing nonresponse bias. Health services research, 48(3), pp.913-930. 
Hanemann, W.M., 1984. Welfare evaluations in contingent valuation experiments with discrete responses. American journal of agricultural economics, 66(3), pp.332-341. 
Hanemann, M. and Kanninen, B., 2001. 11 The Statistical Analysis of Discrete-Response CV Data147. In Valuing environmental preferences: theory and practice of the contingent valuation method in the US, EU, and developing countries (Vol. 302). Oxford University Press on Demand.
Hanemann, W. Michael. Some further results on exact consumer's surplus. No. 1557-2016-132808. 1981. 
Hensher, D.A. and Ho, C., 2016. Identifying a behaviourally relevant choice set from stated choice data. Transportation, 43(2), pp.197-217. 
Hensher, D.A., Rose, J.M. and Greene, W.H., 2012. Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design. Transportation, 39(2), pp.235-245. 
Johnston, R.J. and Abdulrahman, A.S., 2017. Systematic non-response in discrete choice experiments: implications for the valuation of climate risk reductions. Journal of Environmental Economics and Policy, 6(3), pp.246-267. 
Johnston, R.J., Swallow, S.K. and Bauer, D.M., 2002. Spatial factors and stated preference values for public goods: considerations for rural land use. Land economics, 78(4), pp.481-500.
Johnston, R.J., Boyle, K.J., Adamowicz, W., Bennett, J., Brouwer, R., Cameron, T.A., Hanemann, W.M., Hanley, N., Ryan, M., Scarpa, R. and Tourangeau, R., 2017. Contemporary guidance for stated preference studies. Journal of the Association of Environmental and Resource Economists, 4(2), pp.319-405.
Johnston, R.J., Besedin, E.Y. and Wardwell, R.F., 2003. Modeling relationships between use and nonuse values for surface water quality: A meta‐analysis. Water Resources Research, 39(12).
Johnston, R.J., Swallow, S.K., Allen, C.W. and Smith, L.A., 2002. Designing multidimensional environmental programs: Assessing tradeoffs and substitution in watershed management plans. Water Resources Research, 38(7), pp.4-1.
Johnston, R.J., Swallow, S.K. and Weaver, T.F., 1999. Estimating willingness to pay and resource tradeoffs with different payment mechanisms: an evaluation of a funding guarantee for watershed management. Journal of Environmental Economics and Management, 38(1), pp.97-120.
Johnston, R.J., Weaver, T.F., Smith, L.A. and Swallow, S.K., 1995. Contingent valuation focus groups: insights from ethnographic interview techniques. Agricultural and Resource Economics Review, 24(1203-2016-94999), pp.56-69.
Johnston, R.J., Besedin, E.Y. and Stapler, R., 2017. Enhanced geospatial validity for meta-analysis and environmental benefit transfer: an application to water quality improvements. Environmental and Resource Economics, 68(2), pp.343-375.
Koetse, M.J., 2017. Effects of payment vehicle non-attendance in choice experiments on value estimates and the WTA - WTP disparity. Journal of Environmental Economics and Policy, 6(3), pp.225-245. 
Kragt, M.E., 2013. Stated and inferred attribute attendance models: A comparison with environmental choice experiments. Journal of Agricultural Economics, 64(3), pp.719-736. 
Layton, D.F., 2000. Random coefficient models for stated preference surveys. Journal of Environmental Economics and Management, 40(1), pp.21-36. 
Louviere, J.J., Hensher, D.A. and Swait, J.D., 2000. Stated choice methods: analysis and applications. Cambridge university press.
Maddala, G.S., 1983. Methods of estimation for models of markets with bounded price variation. International Economic Review, pp.361-378. 
McConnell, K.E., 1990. Models for referendum data: the structure of discrete choice models for contingent valuation. Journal of environmental economics and management, 18(1), pp.19-34. 
McFadden, D. and Train, K., 2000. Mixed MNL models for discrete response. Journal of applied Econometrics, 15(5), pp.447-470. 
Mitchell, R. and Carson, R., 1989. Using surveys to value public goods. Resources for the Future. Washington, DC.
Montaquila, J.M. and Olson, K.M., 2012. Practical tools for nonresponse bias studies. SRMS/AAPOR Webinar, 24. 
Newbold, S.; Walsh, P.J.; Massey, D.M and Hewitt, J. 2018. "Using Structural Restrictions to Achieve Theoretical Consistency in Benefit Transfers," Environmental and Resource Economics 69:529-553.
Poe, G.L., Welsh, M.P. and Champ, P.A., 1997. Measuring the difference in mean willingness to pay when dichotomous choice contingent valuation responses are not independent. Land economics, pp.255-267. 
Scarpa, R., Gilbride, T.J., Campbell, D. and Hensher, D.A., 2009. Modelling attribute non-attendance in choice experiments for rural landscape valuation. European review of agricultural economics, 36(2), pp.151-174. 
Schkade, D.A. and Payne, J.W., 1994. How people respond to contingent valuation questions: a verbal protocol analysis of willingness to pay for an environmental regulation. Journal of Environmental Economics and Management, 26(1), pp.88-109.
Smith, V.K. and Desvouges, W.H., 1988. Contingent Valuation Methods and the Valuation of Environmental Risk. draft, Resource and Environmental Economics Program, North Carolina State University at Raleigh.
Thiene, M., Scarpa, R. and Louviere, J.J., 2015. Addressing preference heterogeneity, multiple scales and attribute attendance with a correlated finite mixing model of tap water choice. Environmental and Resource Economics, 62(3), pp.637-656. 
Train, K.E., 1998. Recreation demand models with taste differences over people. Land economics, pp.230-239. 
Train, K.E., 2009. Discrete choice methods with simulation. Cambridge university press. 
Van Houtven, G., Powers, J. and Pattanayak, S.K., 2007. Valuing water quality improvements in the United States using meta-analysis: Is the glass half-full or half-empty for national policy analysis?. Resource and Energy Economics, 29(3), pp.206-228.
Vaughan, W.J., 1986. The RFF water quality ladder. Appendix B in Mitchell, Robert Cameron and Richard Carson, The Use of Contingent Valuation Data for Benefit/Cost Analyses in Water Pollution Control, Washington, DC: Resources for the Future.
Viscusi, W.K., Huber, J. and Bell, J., 2008. The economic value of water quality. Environmental and Resource Economics, 41(2), pp.169-187.

