PART B: COLLECTION OF INFORMATION EMPLOYING STATISTICAL METHODS

Part B applies to four WaterSense data collection efforts: a census survey of all organizational partners, a census survey of all licensed certification providers, and two consumer surveys, one of which uses sampling techniques. Section I describes the required elements for the annual reporting process for all organizational partners and quarterly reporting for all licensed certification providers, Section II discusses these elements for the consumer awareness phone survey, and Section III discusses these elements for the Internet-based consumer survey.

SECTION I: ANNUAL REPORTING CENSUS AND QUARTERLY LICENSED CERTIFICATION PROVIDER CENSUS

B1. Describe (including a numerical estimate) the potential respondent universe and any sampling or other respondent selection method to be used. Data on the number of entities (e.g., establishments, state and local government units, households, persons) in the universe covered by the collection and in the corresponding sample are to be provided in tabular form for the universe as a whole and for each of the strata in the proposed sample. Indicate expected response rate for the collection as a whole. If the collection has been conducted previously, include the actual response rate achieved during the last collection.

      WaterSense will collect data from promotional partners, manufacturers, retailers/distributors, and builders via annual reporting each year from 2013 to 2016. WaterSense plans to collect data from professional certifying organizations from 2014 to 2016. WaterSense will collect data from all providers on a quarterly basis. 
      As of February 2013, WaterSense has 1,372 reporting partners. This is a census survey, and no sampling methods will be used for either the annual or quarterly surveys. Table B-1 summarizes the number of partners by type.
      
B-1: WaterSense Reporting Partners by Partner Type, February 2013
Partner Type
Number
Promotional 
829
Manufacturer
239
Retailers/Distributor
206
Builder
66
Professional Certifying Organization
14
Licensed Certification Providers
18
Total
1,372
      
      Table B-2 summarizes the actual response rate received for the annual census during previous data collections. 

B-2: WaterSense Annual Reporting by Partner Type, 2009-2011* 

Partner Type
Reporting Year
Response Rate
Promotional 
2011
30 %

2010
33 %

2009
52 %
Manufacturer
2011
36 %

2010
44 %

2009
41%
Retailer/Distributor
2011
10 %

2010
12 %

2009
20%
Builder
2011
7 %

2010
28 %

2009
N/A  -  reporting not required
Professional Certifying Organization
2009-2011
N/A  -  reporting not required
*Response rates for previous years available upon request. 
 
B-3: WaterSense Quarterly Reporting for Licensed Certification Providers, 2010-2011 

Partner Type
Reporting Quarter
Response Rate
Licensed Certification Providers 
Q3 2012
53 %

Q2 2012
47 %

Q1 2012
40 %

Q4 2011
53 %

Q3 2011
40 %

Q2 2011
31 %

Q1 2011
55 %

Q4 2010
64 %
      
B2. Describe the procedures for the collection of information including: (a) statistical methodology for stratification and sample selection; (b) estimation procedures; (c) degree of accuracy needed for the purpose described in the justification; (d) unusual problems requiring specialized sampling procedures; and (e) any use of periodic (e.g., less frequent than annual) data collection cycles to reduce burden.
      
Annual Reporting 

All WaterSense partners except for irrigation partners and licensed certification providers (who submit quarterly reports as discussed below) submit an annual reporting form electronically via an online form. In addition, manufacturer and retailer/distributor partners are also required to submit shipment and sales data to a WaterSense contractor. The data will be recorded on annual reporting forms and received via the U.S. mail or private delivery service. EPA anticipates that most, if not all, partners will request that shipment and sales data be handled as CBI. WaterSense procedures assume that all shipment and sales data will be handled as CBI. Upon receipt of an annual reporting form with shipment data, the WaterSense data manager provides a listing of reporting partners to the WaterSense Helpline staff. Helpline staff verify eligibility of the partner to report by checking the listing of partners in the manufacturer category. If the partner is not eligible to report, Helpline staff contact the partner to discuss their partnership category and revise the partner's designation in Salesforce, if necessary, or inform the data manager and the Helpline manager that the partner incorrectly submitted data. If the partner is not eligible to report, the data manager will not include the partner's data in subsequent analyses and note the decision on the partner's reporting form. After confirming eligibility to submit data, the data manager or her designee, will review each submission to determine:
    * Does the manufacturer own the original certification file? This step reduces the chances of double counting shipment data submitted by the original manufacturer and others who rebrand and resell the products.
    * Are there internal inconsistencies in the report (e.g., number of WaterSense products shipped exceeds the total number of products shipped)?
    * Are the data unrealistic for the given partner (e.g., shipments are excessive for what might be expected for a company of size reporting)?
    * Is there evidence that the partner has provided estimated data rather than actual? 
    * Is there evidence that the partner has included international shipments or sales?

If the answer to any of the above questions is yes, ERG will contact the partner to clarify. If the partner cannot sufficiently explain the data, they will not be included in subsequent data analyses.

	Provider Reporting
	
	Licensed certification provider partners submit quarterly reporting forms electronically via an online form. WaterSense receives these online forms from partners four times a year, in early February, May, August, and November. The reporting form asks for contact information as well as information about the WaterSense labeled new homes that the provider organization inspected during the previous calendar quarter.
Upon receipt of a quarterly reporting form, Helpline staff verifies that the partner is a bona fide "provider" partner and that contact information for the primary contact at the organization has not changed, and, if it has, updates the WaterSense database accordingly. After confirming eligibility, Helpline staff marks the year and quarter (i.e., Q1 2012) that the information should be associated with in the database. The Helpline manager reviews each submission for reasonableness (e.g., has there been a large increase or decrease in figures reported in previous reports; does the reported figure seem reasonable given the size of the organization and timeframe of the report), and follows up with the partner to clarify any questions about the data submitted.  The Helpline manager then sends an email to any builder partners who are referenced in the quarterly reports to verify that the new homes information submitted was accurate.

B3. Describe methods to maximize response rates and to deal with issues of non-response. The accuracy and reliability of information must be shown to be adequate for intended use. For collections based on sampling, a special justification must be provided for any collection that will not yield "reliable" data that can be generalized to the universe studied.

	Upon joining WaterSense, EPA requires organizational partners to commit to submitting annual reports and requires provider partners to submit quarterly reports. The importance of both annual reporting processes is stressed to partners upon joining. To maximize response EPA conducts the following activities:
    * Require the submission of an annual reporting form for those partners wishing to be considered for a WaterSense award.
    * Host a partner conference call to answer partner questions about the form.
    * Send emails to remind partners of upcoming due dates.
    * Remind partners about reporting in quarterly newsletter.
    * Follow-up phone calls to non-reporting partners.
    * Send 1-2 reporting reminder emails to provider partners each quarter.

B4. Describe any tests of procedures or methods to be undertaken. Testing is encouraged as an effective means of refining collections of information to minimize burden and improve utility. Tests must be approved if they call for answers to identical questions from 10 or more respondents. A proposed test or set of tests may be submitted for approval separately or in combination with the main collection of data.
      
      EPA provided draft versions of the annual reporting form and quarterly reporting form to several partners (fewer than 10) for comment and revised the form accordingly. Upon the completion of each data collection cycle, EPA examines the responses, form, and procedures and identifies potential changes for the following year based on common questions or issues that arise.
      
B5. Provide the name and telephone number of individuals consulted on statistical aspects of the design and the name of the agency unit, contractor(s), grantee(s), or other person(s) who will collect and/or analyze the information for the agency.

	Laura Harwood of Eastern Research Group, Inc. (ERG) is the task manager for both the census surveys and can be reached at (703) 841-0589. Roy Sieber serves as ERG's program manager and can be reached at (703) 633-1614. Dr. Lou Nadeau is an ERG senior economist and can be reached at (781) 674-7316. 
SECTION II: CONSUMER AWARENESS PHONE SURVEY  

B.1.	Describe, including a numerical estimate, the potential respondent universe and any sampling or other respondent selection methods to be used. Data on the number of entities (e.g., establishments, state and local government units, households, persons) in the universe covered by the collection and in the corresponding sample are to be provided in tabular form for the universe as a whole and for each of the strata in the proposed sample. Indicate expected response rates for the collection as a whole. If the collection had been conducted previously, include the actual response rate achieved during the last collection.

	The target population for these two surveys will be households in the United States. This corresponds to the target population of the WaterSense program. The sampling population will be adults living in households that are surveyed. There are 115 million households in the United States according to the most recently available Census Bureau numbers. From this, EPA will select a sample of 400 respondents for each survey. The justification for this sample size appears in response to Question B.2 below. The sampling frame will be a list provided by a telephone survey contractor. Lists of U.S. households are readily available for this type of survey. 
	EPA has not previously conducted these surveys, thus it cannot estimate the expected response based on past experience. Evaluation practitioners have documented that survey response rates have been declining worldwide in recent years. In a recent study of response rates for 205 telephone surveys conducted in the same survey lab over a three-year period, study authors found that response rates varied from fewer than 10 percent to greater than 90 percent response, with the largest number of surveys falling in the 25 to 45 percent range. Further, the study authors found that response rates are affected by contextual variables including:
* Salience of the survey to the population
* Survey length
* Type of sample (e.g., listed versus random-digit dialing)
* Minutes per piece of sample (i.e., effort)
* Amount of time the survey was in the field 

A 10-minute increase in survey length results in a 7 percent decrease in the response rate. Since the proposed WaterSense customer awareness survey is short, EPA anticipates a relatively high response rate. In addition, EPA will take actions to maximize response rate (see discussion in Section B.3 below). Given these factors, EPA estimates a response rate of approximately 40 percent for the consumer awareness survey. 

B.2.	Describe the procedures for the collection of information including:
      ::	Statistical methodology for stratification and sample selection
      ::	Estimation procedure
      ::	Degree of accuracy needed for the purpose described in the justification
      ::	Unusual problems requiring specialized sampling procedures
      ::	Any use of periodic (e.g., less frequent than annual) data collection cycles to reduce burden.

	Statistical Method for Stratification and Sample Selection

	This section discusses the statistical methods for stratification and sample selection used in the survey. Before discussing those methods, the section begins by explaining EPA's approach to estimating the sample size for this collection.

	Sample Size Estimation

      This section provides estimates of sample size needed to meet a set of specific statistical criteria for the survey data. Choosing a sample size is a three-step process. First, statistical criteria are set; second, an initial sample size is chosen based on the criteria; and third, if necessary, the initial sample size is adjusted based on the population size. The third step, however, is not needed for this survey due to the large universe size. The section begins with a discussion of the statistical criteria and then discusses the estimated initial and adjusted sample sizes.

      The statistical criteria used in choosing a sample size are:
      * Precision: The maximum difference in the parameter of interest between an estimate for that parameter obtained from the sample and the value of that parameter in the population.
      * Confidence: The probability of correctly accepting a true hypothesis.
      * Power: The probability of correctly rejecting a false hypothesis.

      Simultaneously setting values for each criterion will generate a sample size estimate. In general, however, there are limited acceptable choices in terms of power and confidence and, therefore, most of the emphasis is in choosing a reasonable precision. To select sample size using these criteria, EPA has used statistical power analysis techniques (Cohen, 1988). The use of each in choosing a sample size is discussed in what follows.
      
      Precision

      The precision of a sample is the maximum difference in units of the key question that one is willing be away from the population parameter of interest. The parameter of interest for this survey is awareness of the WaterSense program. There are two key aspects of this parameter:
      * The level of awareness (i.e., what percentage of households are aware of the program?).
      * Whether or not awareness is increasing over time (i.e., is there a statistically significant increase in awareness between surveys?).

      The first reflects estimation of a proportion (e.g., percentage of households aware of the program) and the second reflects a change in a proportion over time. Thus, precision will need to be in terms of percentage points. EPA has determined that the sample size should provide valid data for the second aspect (i.e., change in awareness over time), but should also provide reasonable reliable data to determine the level awareness also. EPA has determined that the sample should be able to detect three to five point change in awareness over time. For example, if awareness is found to be 5 percent in this survey and a subsequent survey finds awareness to be 10 percent, the sample sizes for each survey should be large enough to find that change to be statistically significant.

      Power

The power of a statistical test is one minus the probability of a Type II error (e.g., not rejecting a false hypothesis), or the probability of correctly rejecting a false hypothesis. In this sense, power corresponds to finding the specified effect size (i.e., precision) when that effect in fact exists. Traditional hypothesis tests set, by default, power to 50 percent. Following Cohen's (1988) suggestion, EPA will use 80 percent power. Thus, in terms of the second aspect identified above under precision (change in awareness over time), the sample size will have an 80 percent chance of detecting the specified change in awareness if that change actually occurred.

	Confidence

Confidence is the probability of accepting a true hypothesis. For purposes of sampling, confidence defines the likelihood that the population mean will be contained in the interval around the sample mean defined by the precision for the sample. EPA will use 90 percent confidence in setting the sample size. Furthermore, since EPA is interested in increases in awareness, EPA will use a one-sided confidence interval.
      
	Sample Size Estimates

	The initial sample size estimates are based on the methods of Cohen (1988), Chapter 6, which provides power analytic methods for detecting differences between proportions over time. EPA combined these methods with the precision, power, and confidence setting discussed above and an assumption on the current level of awareness in the universe. Setting a sample size for proportions requires either having some knowledge of proportion for the population beforehand, or making an assumption on the proportion for the population. EPA expects that awareness of WaterSense is fairly low among the universe and has assumed it to be between two and ten percent. Thus, the sample size should be able to detect three to five percent increase in awareness (precision) between surveys from the assumed baseline levels of awareness at 90 percent confidence and 80 percent power. Table B-3 provides sample sizes needed to detect the increases in awareness for the assumed range of baseline awareness (for the set power and confidence). Based on this table, EPA has selected a sample size of 400 households. Sample sizes in the table that are less than 400 households are depicted in bold italics. As can be seen, a sample of 400 households will allow for detection of a five point increase in awareness from all baseline levels, detection of a four point increase in awareness for baseline awareness in the two to five percent range of baseline awareness, and detection of a three point increase in awareness for very low baseline awareness (2 percent).

            Table B-3. Sample Sizes Needed to Detect the Specified Precision Values for Assumed Values of Baseline Awareness Between Two and Ten Percent at 90 Percent Confidence and 80 Percent Power
              Assumed Baseline Level of Awareness in the Universe
                Precision: Increase in Awareness to be Detected
                                       
                            Three Percentage Points
                            Four Percentage Points
                                     Five 
                               Percentage Points
                                      2%
                                      323
                                      203
                                      143
                                      3%
                                      419
                                      257
                                      178
                                      4%
                                      512
                                      310
                                      212
                                      5%
                                      602
                                      360
                                      244
                                      6%
                                      689
                                      409
                                      275
                                      7%
                                      774
                                      457
                                      306
                                      8%
                                      857
                                      503
                                      335
                                      9%
                                      938
                                      548
                                      364
                                      10%
                                     1016
                                      592
                                      391
            Note: The values in Table B-3 are calculated using the following formula:

            
            
            where n0.1 is equal to 902 (in this case) and is derived from Cohen, Chapter 6, table 6.4.1 (third panel) and reflects a base sample size for one-sided 90 percent confidence and 80 percent power, p are the baseline awareness values, and r are the precision values.

	Sample Allocation

	EPA will allocate the sample across the United States in proportion to the distribution of households in the United States.
	
	Estimation Procedure

	Estimation of population parameters (means, totals, and variances) will be done by appropriately weighting the sample values. The weight for any responding sample unit can be calculated as:

weight = (weight associated with sampling)  (weight that adjusts for nonresponse)

Estimation of these weights are discussed below, followed by calculation of weighted estimates for the population parameters. 

      The weights associated with sampling can be estimated as the reciprocal of the selection probabilities. These can be estimated as:

                        					(6)

where Nh is the population for stratum h and nh is the sample chosen from the stratum h. The second equality indicates that the weight for each sample unit from a specific stratum will be the same as all of sampled units from that stratum.

	To account for nonresponse, EPA will use a weighting class adjustment procedure. Specifically, EPA assumes that the probability that any population unit responds to the survey is the product of the selection probability (above) and a response probability. To estimate response probabilities, EPA will divide the respondents in each survey into a small number of classes (strata). The exact classes for each survey will depend on the nature of the data available for each group. The response probabilities are then:

            			(7)

where k indexes the nonresponse weighting classes. Thus, the weight for a respondent in class k and stratum h becomes:

                        						(8)


      The weighted total for any question in these surveys can be found using the following formula:

                        					(9)

where is the weighted estimate of the population total for variable/question y, h indexes strata, H is the total number of strata used in the survey, j indexes sample respondents within each stratum, Jh is the total number of respondents in stratum h,  is the nonresponse-adjusted sampling weight for the jth unit in stratum h, and yhj is the value for the variable/question y for the jth unit in stratum h. Estimates of the population means can then be found as:

                        					(10)

      Estimation of the variance for the expressions in equations (9) and (10) will require more complex methods. The variance of both expressions will need to account for (a) the fact that each represents a sample estimate of a population parameter and (b) the fact that the response weights defined in (7) are sample quantities (i.e., they are not fixed). To calculate these variances, EPA will use a bootstrap method. Bootstrap methods involve repeated sub-sampling (with replacement) of the selected sample and calculating the variance from the repeated samples. In theory, if the sample is reflective of the population distribution, then repeated sub-sampling of the sample will result in an unbiased and reliable estimate of the variance. OLMS will use the Rao and Wu (1988) method for bootstrap estimation of a variance. This method is summarized in Lohr (1999, p. 307). 

	Degree of Accuracy

      See above.

	Unusual Problems

      None.

	Use of Periodic Rate Less Than Annually

	Data collected under these surveys are one-time and are thus less frequent than annual.

 B.3.	Describe methods to maximize response rates and to deal with issues of non-response. The accuracy and reliability of information collected must be shown to be adequate for intended uses. For collections based on sampling, a special justification must be provided for any collection that will not yield "reliable" data that can be generalized to the universe studied.
	EPA will employ a number of procedures to maximize response rates and to mitigate issues associated with non-response. A number of nonresponse issues may arise during the data collection process. Table B-4 summarizes issues related to nonresponse and the approach to handling those issues. EPA will analyze the known data on non-respondents (e.g., region, size, etc.) for patterns. This analysis will indicate whether the data collected through the survey will have any potential biases. 
Table B-4. Non-response Issues and Techniques Used to Minimize the Impact of Those Issues
Non-response Issue 
Techniques to Be Used to Minimize Impact of Non-response
Refusals -- Observational unit refuses to take survey
   *  EPA will be using a professional survey firm that is skilled in converting refusals.
   *  EPA will replace refusals with similar households (e.g., within the same geographic region).
   *  EPA will develop a questionnaire that limits the burden imposed on observational units.
Not available -- Observational unit not available at the time the phone survey firm calls
   *  The survey firm will call the household back up to three times before considering them non-respondents and excluding them from the sample.
   *  EPA will replace refusals with similar households (e.g., within the same geographic region).
Refusal to answer specific questions -- Observational unit refuses to answer specific questions
   *  EPA will be using a professional survey firm that is skilled in converting refusals.
   *  EPA will develop a questionnaire that limits the burden imposed on observational units.


B.4.	Describe any tests of procedures or methods to be undertaken. 

	EPA will pre-test the survey instruments using a small set of households. The purpose of the pre-tests will be to assess the questionnaire validity in collecting the necessary data.
	No other tests of procedures or methods will be used for these surveys. EPA expects that the sampling scheme and the implementation process are relatively straightforward and should be accomplished without need for testing the methods.

B5. 	Provide the name and telephone number of individuals consulted on statistical 	aspects of the design and the name of the agency unit, contractor(s), grantee(s), or 	other person(s) who will collect and/or analyze the information for the agency.

   	Dr. Lou Nadeau, ERG, (781) 674-7316.
SECTION III: CONSUMER SURVEY (INTERNET-BASED)

B1. 	Describe, including a numerical estimate, the potential respondent universe and any 	sampling or other respondent selection method to be used. Data on the number of 	entities (e.g., establishments, state and local government units, households, 	persons) in the universe covered by the collection and in the corresponding sample 	are to be provided in tabular form for the universe as a whole and for each of the 	strata in the proposed sample. Indicate expected response rate for the collection as 	a whole. If the collection has been conducted previously, include the actual response 	rate achieved during the last collection.

	The purpose of this survey is to increase EPA's understanding of market barriers that exist with water-efficient products and services, develop messages that will address these barriers, and test the results of the findings in key markets. EPA will invite participants to provide feedback on promotions and messaging. EPA will use the data gathered to develop messaging for public service announcements. 
	Approximately four hundred US adults will be invited to participate in the research effort. Only homeowners will be sampled, since renters are unlikely to be responsible for purchasing low-flow plumbing equipment. Data will be stratified by whether the individuals live in water rich or water poor areas of the nation.  
	The goal of this research is to determine which methods of communicating water efficient-products are most effective (i.e., translate into the most positive attitudes toward and intentions to buy) to U.S. consumers. We are not trying to estimate a population parameter (e.g., such as general awareness of the WaterSense logo), rather we are comparing efficacy of communication techniques. Rule of thumb in this case is that we will need to use 20-30 respondents in each communication condition.  
	We will be drawing our sample using an electronic panel of homeowners and will not be using a random sample; thus, will not have information regarding nonresponse. Nonresponse bias is of concern when attempting to generalize from the sample estimate to a population parameter, but becomes somewhat unimportant when the goal is to determine differences between treatment groups. 

B2.	Describe the procedures for the collection of information including: (a) statistical 	methodology for stratification and sample selection; (b) estimation procedures; (c) 	degree of accuracy needed for the purpose described in the justification; (d) unusual 	problems requiring specialized sampling procedures; and (e) any use of periodic 	(e.g., less frequent than annual) data collection cycles to reduce burden.
      
      a)    We will not be using a statistical methodology for stratification and sample selection since the goal of the research is to determine differences between groups.
      b)    We will use Multivariate Analysis of Variance (MANOVA) and Multivariate Analysis of Covariance (MANCOVA) to test for differences between groups and to control for covariates.
      c)    A "by-invitation-only" only panel recruitment method is used by the on-line panel provider that will be collecting the data for this study. This means that only pre-validated individuals participate. Thus, a cross-section of "real" consumers, as opposed to regular survey takers, are included in the sample. The panel to be used does not employ an "open" online panel recruitment method which simply allows individuals to self-select into the study.  
      d)    There are no unusual problems that will require a specialized sample procedures.
      e)    Data will not be collected periodically. This is a one-time study.

B3. 	Describe methods to maximize response rates and to deal with issues of non-	response. The accuracy and reliability of information must be shown to be adequate 	for intended use. For collections based on sampling, a special justification must be 	provided for any collection that will not yield "reliable" data that can be generalized 	to the universe studied.

	Since our purpose is to test for differences between groups, generalizability to the population at large (i.e., external validity) is less important than assurance that the experimental condition created the effect (i.e., internal validity). In such situations as ours, nonresponse bias is not a major concern in the study design.


B4. 	Describe any tests of procedures or methods to be undertaken. Testing is 	encouraged as an effective means of refining collections of information to minimize 	burden and improve utility. Tests must be approved if they call for answers to 	identical questions from 10 or more respondents. A proposed test or set of tests may 	be submitted for approval separately or in combination with the main collection of 	data.

	Pretesting of the questionnaire design and procedures will be conducted in two different ways. First, using the methods described in Dillman (2000), we will use a verbal protocol with less than ten home owners to gain a better understanding of how the target population is interpreting our questions and whether we have included the appropriate response categories. Second, we will pretest the on-line survey to insure that it is working properly, is user-friendly and that the data can be downloaded adequately. 
      
*Don A. Dillman (2000) Mail and Internet Surveys: The Tailored Design Method. John Wiley & Sons, Inc., NY, NY. 

B5.	Provide the name and telephone number of individuals consulted on statistical 	aspects of the design and the name of the agency unit, contractor(s), grantee(s), or 	other person(s) who will collect and/or analyze the information for the agency

   	Dr. Lou Nadeau, ERG, (781) 674-7316.

REFERENCES

Cohen, Jacob, 1988. Statistical Power Analysis for the Behavioral Sciences, 2[nd] Edition, Lawrence Earlbaum Associates Publishers, Hillsdale, N.J.

Lohr, Sharon, 1999. Sampling: Design and Analysis, Duxbury Press, Pacific Grove, CA.

Rao, J.N.K., and C.F.J. Wu, 1988. "Resampling Inference with Complex Survey Data," Journal of the American Statistical Association, 83: 231-241. 

