MEMORANDUM

TO:		Jameel Alsalam, EPA
		Elizabeth Miller, EPA
		Melissa Weitz, EPA

FROM:		Casey Pickering, ERG
		Robyn Reid, ERG

DATE:		February 6, 2018

SUBJECT:		DI Desktop Data Processing Overview for OAP/OAQPS

ERG processes DrillingInfo data for EPA's Office of Compliance as described in Section 1 below. ERG has provided data packages to EPA OAQPS in recent years, including key data fields for processed well-level records meeting certain criteria; Section 2 documents the export methodology updated per EPA's January 2018 request.
Attachment A contains a full DrillingInfo data dictionary.
Attachment B contains a code breaker table for the DrillingInfo "status" data field.
Attachment C contains a DrillingInfo table summarizing data coverage.

    DI Desktop Data Processing Overview

ERG processes DrillingInfo data for EPA's Office of Compliance as described in the simplified summary steps below. The step descriptions include documentation of revisions to processing made between December 2015 (previous delivery of processed data to OAQPS) and current.

 Download the DI Desktop January 2017 database update from DrillingInfo's FTP site that included the following tables:
 PDEN_DESC: contains well identification
 PDEN_PROD: contains well production data
 PDEN_WELL_TEST: contains well test data
 NPH_WELLSPOTS: contains well identification information for wells that have commingled production

 Import tables into Oracle for processing.
      Note: The Texas Railroad Commission requires hydrocarbon production to be reported at the lease level. As leases often include more than one well at any given time, it is difficult to understand the production behavior of individual wells in Texas. DrillingInfo developed a "Texas Allocated" data set included within PDEN_DESC that assigns production values to individual wells. This data set is used in the OC processing. DrillingInfo's methodology addresses various complications associated with the Texas data set, for example: the assigned API number for a well changing over time; non-production wells such as injection or monitoring wells present on a lease; and well-specific shut in dates that indicate when allocation of lease-level production should end. 
      Improvements have been made to the OC processing using the Texas Allocated data set since the previously delivered December 2015 export. Corrections to the data processing resulted in production volumes for many Texas wells decreasing by half. DrillingInfo has also made several improvements to their Texas Allocated data processing since December 2015, which have generally resulted in a decrease in the number of unique active well records in Texas.
      
 Combine descriptive information for duplicate records.
      There are some API numbers (API_NO) in the PDEN_DESC table with duplicate information because data are stored by completion zone rather than at the well or lease level. ERG combined records for duplicate API numbers into a single record to avoid over counting wells using the following methodology:
 Combined all values separated by commas (e.g., "Gas,Oil") for the PRODUCTION_TYPE, GAS_GATH_NAME_1, and LIQ_GATH_NAME_1. ERG was unable to combine all the values for other text fields (e.g., well name, current operator name, common operator name) because of a limit on the number of characters for the field.

 Take the most recent LAST_PROD_DATE.
         
 Take the minimum non-zero value, if available, for numeric fields or first alphabetically for text fields for the following fields:
BASIN
LONGITUDE_EX1
COMMON_OPER_NAME
MERIDIAN
COMPLETION_DATE
OFFSHORE
CURR_OPER_NAME
PROPERTY_TYPE
DISTRICT
QTR_QTR
DRILL_TYPE
RESERVOIR
ENTITY_ID
RESERVOIR_TYPE
FIELD
SECTION
FIRST_PROD_DATE
SPUD_DATE
FORMATION
STATE
LEASE_NO
STATUS
LOCATION
WELL_ID
LONGITUDE_NAD27
WELL_NAME
LONGITUDE_NAD83
WELL_NO
LONGITUDE_BOTM
YIELD

 Take the maximum value for numeric fields for the following fields:
COUNTY
LATITUDE_NAD27
GAS_GRAV
LATITUDE_NAD83
GOR
LATITUDE_BOTM
GOR_CUM
LATITUDE_EX1
GOR_LATEST_MO
LIQ_GRAV
LATEST_FLOW_PRESSURE
MONTHS_PRODUCED
LATEST_WHSIP
PLUG_DATE
       
TOTAL_DEPTH
       
 
      
4.	Append records for surface well spots.
      ERG appended records for surface well spots (SPOT_ID) that were not already included in the Production Wells Table from the NPH_WELLSPOTS table. The majority of these records are for API numbers where the production is reported at the lease-level rather than well-level.
      In December 2016, ERG adjusted the methodology used to determine if surface well spot records should be appended to the Production Wells Table. Because a large number of well spot records (~800,000) have API numbers that were assigned a suffix (e.g., -00, -01) ERG now matches on the first 12 numerical digits of the API numbers to determine if a matching record exists in the Production Wells Table.  Well spot records without an API match are appended, as well as records that do have an API match but have different latitude and longitude values. This resulted in a decrease in the number of unique active well records since the previously delivered December 2015 export.
      
5.	Convert monthly to annual production. ERG combined the monthly production from the PDEN_PROD table into annual production for 1990 through 2016. 

6.	Distribute lease-level well production to all wells on lease. For each entity ID that has more than one record (typically because of well spots import), ERG distributes the annual production data across all wells with that entity ID.

7.	Add FIPS Codes.

      ERG added the Federal Information Processing Standard (FIPS) code that uniquely identifies counties and county equivalents in the FIPS code field (FIPS_CODE) using the following steps.
      ::	Assign the FIPS code based on the county and state using an EPA crosswalk for wells with counties and states that match with the EPA crosswalk.
      ::	Assign the FIPS code based on the latitude and longitude using GIS for the wells in the Production Wells Table without a populated county field, or with a county and state combination that does not exist.
      
8.	Add various data fields based on calculations, external data sources, and crosswalks. 
      For example: 
      ERG added a yes/no field (SSA_FLAG) to indicate whether a well was located above a sole source aquifer using EPA's Sole Source Aquifer Protection Program GIS data, the well's latitude and longitude, and GIS.
      ERG added a yes/no field (FEDERAL_FLAG) to indicate whether a well was located on federal lands using the U.S. Department of Interior's National Atlas of the United States federal land boundaries, the well's latitude and longitude, and GIS.
      ERG added a field (NNA_FLAG) to indicate whether a well was located in a 2008 8-hour ozone nonattainment area and the associated classification (i.e., extreme, severe, serious, moderate, marginal) using EPA's nonattainment data download file, the well's latitude and longitude, and GIS.
      ERG added the GORXX values calculated as SUMOFGASXX/SUMOFLIQXX. From this value, the GOR_QUALXX field is populated as:
 'No Prod' where there are no production data for SUMOFGASXX or SUMOFLIQXX
 'Liq+Gas' where there are production data for both SUMOFGASXX and SUMOFLIQXX
 'Gas Only' where there are production data for SUMOFGASXX only
 'Liq Only' where there are production data for SUMOFLIQXX only
      
    Data Export Methodology for OAQPS

Data Source
 2013 Analysis: Data based on OC processed wells table from February 2014, with additional fields from the February 2014 underlying tables.
 2015 Analysis: Data based on OC processed wells table from October 2014.
 Oct. 2017 & Feb. 2018 Analysis: Data based on OC processed wells table from January 2017.
 Note, a query form and additional documentation are available to EPA users at: https://echo.epa.gov/targeting/knowledge-bases/land-based-oil-and-gas-search

Export Criteria
Export for the February 2018 analysis was an expanded version of the October 2017 analysis, to records that meet the following criteria:
 Wells with gas or liquid production in 2011 to current; and
 Wells with completion year of 2011 to current (the more recent wells may not have production yet).
 Additional columns related to practical initial production and peak production: LIQ_PRAC_IP_DAILY, GAS_PRAC_IP_DAILY, BOE_PRAC_IP, PEAK_DAILY_GAS_MONTH, PEAK_DAILY_GAS_MONTH_NO, PEAK_DAILY_LIQ_MONTH, PEAK_DAILY_LIQ_MONTH_NO, PEAK_BOE_MONTH, PEAK_BOE_MONTH_NO
 Because these fields are not included in the standard raw data processing, there is not a set methodology for handling wells with more than one unique record across these fields (as described in Section 1, Step 3. Combine descriptive information for duplicate records). In the case where a given well has more than one unique record across these fields, the values shown in these fields are essentially randomly selected from the multiple records. Therefore, a separate export table showing all unique records for a given API number is provided in conjunction with the main February 2018 export.
             
Notes on Table Below
 Green shaded cells indicate columns added to the output for the 2013 analysis based on additional analyses. Note that all columns that were added for the 2013 analysis have been incorporated into the standard OC processing.
 Orange shaded rows indicate columns added to the output in the 2015 analysis.
 Purple shaded rows indicate columns added to the output in the 2017 analysis.
 Pink shaded rows indicate columns added to the output in the 2018 analysis.
 "DI" indicates that the data element is developed by DrillingInfo, and only edited by ERG in cases where multiple API numbers are combined (as described in "Description" column).
 "ERG" indicates this field is created by ERG.

                                  Column Name
                              Include in Export?
                                  DI or ERG?
                                  Description
                                       
                                     2013
                                     2015
                                     2017
                                     2018
                                       
                                       
ENTITY_ID
Yes
Yes
Yes
Yes
DI
DrillingInfo assigned unique property ID. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
SPOT_ID
Yes
Yes
Yes
Yes
DI
DrillingInfo assigned unique ID for multiple API numbers on one lease or directional side tracks for the same location.
API_NO
Yes
Yes
Yes
Yes
DI
API assigned number of a well on the property.
PROPERTY_TYPE
Yes
Yes
Yes
Yes
DI
Property type (e.g., lease, unit, well, completion, other, unknown). Note: for instances where duplicate API numbers were combined, the minimum value was selected.
PRODUCTION_TYPE
Yes
Yes
Yes
Yes
DI
Production type (e.g., oil, gas, coalbed methane, injection). Note: for instances where duplicate API numbers were combined, the production type fields are comma separated.
PROD_TYPE_CLASS
Yes
Yes
Yes
Yes
ERG
Classification of production type into D&A (drilled and abandoned), gas, injection, O&G (oil and gas), oil, and other.
PROD_FLAG
Yes
Yes
Yes
Yes
ERG
Production flag to indicate whether the well should be producing liquids. This is "Yes" for "Gas," "Oil," and "O&G" production type classification.
LIQUID_PROD_TYPE
Yes
Yes
Yes
Yes
ERG
Liquid production type (i.e., unknown, condensate, or oil) based on the production type classification (gas = condensate, oil = oil, all others = unknown) and well test data (revise unknown assignments based on the test liquid gravity; oil if liquid gravity <40 and condensate if liquid gravity >=40).
WELL_NAME
Yes
Yes
Yes
Yes
DI
Operator assigned well/lease name of the property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
LEASE_NO
Yes
Yes
Yes
Yes
DI
State number assigned to the property or lease or unit the property is part of. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
WELL_NO
Yes
Yes
Yes
Yes
DI
Operator assigned well number of the property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
CURR_OPER_NAME
Yes
Yes
Yes
Yes
DI
Current operator name. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
LIQ_GATH_NAME_1
No
No
Yes
Yes
DI
The current primary liquid gather for a property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
GAS_GATH_NAME_1
No
No
Yes
Yes
DI
The current primary gas gather for a property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
COMMON_OPER_NAME
Yes
Yes
Yes
Yes
DI
Corporate entity that is determined by DI Desktop to own the current operator. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
LATITUDE_NAD83
Yes
Yes
Yes
Yes
DI
Surface latitude the property is located in; for multi-well properties DI Desktop picked a well to designate the location of the property, NAD83. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
LONGITUDE_NAD83
Yes
Yes
Yes
Yes
DI
Surface longitude the property is located in; for multi-well properties DI Desktop picked a well to designate the location of the property, NAD83. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
COUNTY
Yes
Yes
Yes
Yes
DI
County the property is located in. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
FIPS_CODE
Yes
Yes
Yes
Yes
ERG
Federal Information Processing Standard county code based on county or GIS analysis using latitude and longitude.
STATE
Yes
Yes
Yes
Yes
DI
State the property is located in. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
EPA_REGION
Yes
Yes
Yes
Yes
ERG
EPA region the property is located in.
OFFSHORE
Yes
Yes
Yes
Yes
DI
Indicates if the property is located offshore. Note: for instances were duplicate API numbers were combined, the maximum value was selected.
BASIN
Yes
Yes
Yes
Yes
DI
Basin the property is located in. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
RESERVOIR
Yes
Yes
Yes
Yes
DI
Reservoir the property is located in. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
FORMATION
Yes
Yes
Yes
Yes
DI
Formation the property is reporting from. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
FIELD
Yes
Yes
Yes
Yes
DI
Field name the property is reporting from. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
RESERVOIR_TYPE
Yes
Yes
Yes
Yes
ERG
Determined based on RESERVOIR, FIELD, and STATE in conjunction with crosswalk developed for EPA GHG Inventory. 
CV  -  Conventional; CB  -  Coal Bed Methane; 
LP  -  Low Permeability; SH  -  Shale; Blank  -  Unknown
UNCONV_RESERVOIR
Yes
Yes
Yes
Yes
ERG
Determined based on reservoir type; Yes = CB, LP, and SH; No = CV or Blank.
STATUS
Yes
Yes
Yes
Yes
DI
Current status of the well (e.g., active, inactive, shut in). Note: for instances where duplicate API numbers were combined, the minimum value was selected.
TOTAL_DEPTH
Yes
Yes
Yes
Yes
DI
Total depth the property was drilled to. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
DRILL_TYPE
Yes
Yes
Yes
Yes
DI
Drill type (e.g., horizontal, vertical, directional). Note: for instances where duplicate API numbers were combined, the minimum value was selected. 
SPUD_DATE
Yes
Yes
Yes
Yes
DI
Latest date drilling commenced on property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
COMPLETION_DATE
Yes
Yes
Yes
Yes
DI
Latest completion date of the property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
PLUG_DATE
Yes
Yes
Yes
Yes
DI
Date the well was plugged. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
COMPLETION_YEAR
Yes
Yes
Yes
Yes
ERG
Determined based on completion date year, if populated, or minimum production year.
HF
Yes
Yes
Yes
Yes
ERG
Yes if drill type is horizontal (H) or unconventional reservoir is yes (i.e., coalbed methane (CB), low permeability (LP), or shale (SH)). 
FIRST_LIQ
Yes
Yes
Yes
Yes
DI
Liquid production reported in Bbls during the first month of production.
FIRST_GAS
Yes
Yes
Yes
Yes
DI
Gas production reported in MCF during the first month of production.
FIRST6_LIQ
Yes
Yes
Yes
Yes
DI
The sum of 1[st] 6 calendar months of liquid production reported in Bbls. The first month is the first month of the production of any type of product.
FIRST6_GAS
Yes
Yes
Yes
Yes
DI
The sum of 1[st] 6 calendar months of gas production reported in MCF. The first month is the first month of the production of any type of product.
FIRST12_LIQ
Yes
Yes
Yes
Yes
DI
The sum of the 1[st] 12 calendar months of liquid production reported in Bbls. The first month is the first month of production of any type of production.
FIRST12_GAS
Yes
Yes
Yes
Yes
DI
The sum of the 1[st] 12 calendar months of gas production reported in MCF. The first month is the first month of production of any type of production.
SUMOFLIQ_DAILY
Yes
Yes
Yes
Yes
DI
Average daily liquid production in last 12 months, summed over duplicate API numbers.
SUMOFGAS_DAILY
Yes
Yes
Yes
Yes
DI
Average daily gas production in last 12 months, summed over duplicate API numbers.
SUMOFLIQ_CUM
Yes
Yes
Yes
Yes
DI
Cumulative liquid production of property, reported in bbl, summed over duplicate API numbers.
SUMOFGAS_CUM
Yes
Yes
Yes
Yes
DI
Cumulative gas production of property, reported in MCF, summed over duplicate API numbers.
SUMOFWTR_CUM
Yes
Yes
Yes
Yes
DI
Cumulative water production of property, summed over duplicate API numbers.
LIQ_GRAV
Yes
Yes
Yes
Yes
DI
Gravity of the liquid produced from the property. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
GAS_GRAV
Yes
Yes
Yes
Yes
DI
Gravity of the gas produced from the property. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
GOR
Yes
Yes
Yes
Yes
DI
Gas to oil ratio of the property. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
GOR_LATEST_MO
Yes
Yes
Yes
Yes
DI
Gas to oil ratio of the property for the latest month of production. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
GOR_CUM
Yes
Yes
Yes
Yes
DI
Gas to oil ratio of the property for all months of production. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
YIELD
Yes
Yes
Yes
Yes
DI
Oil to gas ratio of the property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
FIRST_PROD_DATE
Yes
Yes
Yes
Yes
DI
First date DI Desktop has reported production for the property. Note: for instances where duplicate API numbers were combined, the minimum value was selected.
LAST_PROD_DATE
Yes
Yes
Yes
Yes
DI
Last date DI Desktop has reported production for the property. Note: Even though the date may be represented as mm/dd/yyyy, it is for the month listed, not just the individual day. For example, production through December 2009 is listed as 12/1/2009. Note: for instances where duplicate API numbers were combined, the maximum value is selected.
MONTHS_PRODUCED
Yes
Yes
Yes
Yes
DI
Number of months a property has reported liquid, gas, or water production. Note: for instances where duplicate API numbers were combined, the maximum value was selected.
LIQ_PRAC_IP_DAILY
No
No
No
Yes
DI
Oil practical initial production (PRAC IP) per day. (Prac IP is the based on the second month of production to represent the first "full" month producing) . So if Jan was the first production record and the second was Sept, Sept would be considered the second month for Prac IP calculations.
GAS_PRAC_IP_DAILY
No
No
No
Yes
DI
Gas practical initial production (PRAC IP) per day. (Prac IP is the based on the second month of production to represent the first "full" month producing). So if Jan was the first production record and the second was Sept, Sept would be considered the second month for Prac IP calculations.
BOE_PRAC_IP
No
No
No
Yes
DI
Barrel of oil equivalent (BOE) practical initial production (PRAC IP). (Prac IP is the based on the second month of production to represent the first "full" month producing). BOE = oil + (gas/6). If "Days On" is available it is used for the calculations, otherwise the number of days in the month is used. So if Jan was the first production record and the second was Sept, Sept would be considered the second month for Prac IP calculations.
PEAK_DAILY_GAS_MONTH
No
No
No
Yes
DI
Month in which the peak daily gas production occurred. If "Days On" is available it is used for the calculations, otherwise the number of days in the month is used.
PEAK_DAILY_GAS_MONTH_NO
No
No
No
Yes
DI
The month number of the peak daily volume. If it occurred in the first month of production then it is 1, in the second it is 2 and so on. It should be noted that the first month for this calculation is the month that it first began producing gas, not just any product.
PEAK_DAILY_LIQ_MONTH
No
No
No
Yes
DI
Month in which the peak daily oil production occurred. If "Days On" is available it is used for the calculations, otherwise the number of days in the month is used.
PEAK_DAILY_LIQ_MONTH_NO
No
No
No
Yes
DI
The month number of the peak daily volume. If it occurred in the first month of production then it is 1, in the second it is 2 and so on. It should be noted that the first month for this calculation is the month that it first began producing oil not just any product.
PEAK_BOE_MONTH
No
No
No
Yes
DI
Month in which the peak barrels of oil equivalent (BOE) production occurred. BOE= oil + (gas / 6 ).
PEAK_BOE_MONTH_NO
No
No
No
Yes
DI
The month number of the peak BOE volume. If it occurred in the first month of production then it is 1, in the second it is 2 and so on. It should be noted that the first month for this calculation is the month that it first began producing oil or gas.
SUMOFLIQ_YEAR
Yes
Yes
Yes
Yes
DI
Liquid production, reported in bbls, for the 12 months ending with LAST_PROD_DATE, summed over duplicate API numbers.
SUMOFGAS_YEAR
Yes
Yes
Yes
Yes
DI
Gas production, reported in MCF, for the 12 months ending with LAST_PROD_DATE, summed over duplicate API numbers.
SUMOFWTR_YEAR
Yes
Yes
Yes
Yes
DI
Water production, reported in bbls, for the 12 months ending with LAST_PROD_DATE, summed over duplicate API numbers.
SUMOFLIQXX
2011-Present
2013-Present
Yes
Yes
ERG
Liquid production, reported in bbl, summed over duplicate API numbers. 
Data are processed for 1990 through 2014.
SUMOFGASXX
2011-Present
2013-Present
Yes
Yes
ERG
Gas production, reported in MCF, summed over duplicate API numbers. 
Data are processed for 1990 through 2014.
SUMOFWTRXX
2011-Present
2013-Present
Yes
Yes
ERG
Water production, reported in bbl, summed over duplicate API numbers. 
Data are processed for 1990 through 2014.
GORXX
NA
Yes
Yes
Yes
ERG
Gas-to-Oil ratio as calculated using the annual production (mcf/bbl). Note, wells producing gas only show a value of zero.
Data are processed for 1990 through 2014.
GORXX_QUAL
NA
Yes
Yes
Yes
ERG
Flag to identify if wells have production and if the production is only gas, only liquids, or both gas and liquids.
Data are processed for 1990 through 2014.
NNA_FLAG
Yes
Yes
Yes
Yes
ERG
Flag for whether the well is located in an 8-hour ozone nonattainment area (2008 standard), as defined with the GIS layer available here:http://www.epa.gov/airquality/greenbook/gis_download.html. The flag identifies the specific type of 8-hour ozone classification the well is located within (i.e., extreme, severe 15, serious, moderate, or marginal).
SSA_FLAG
Yes
Yes
Yes
Yes
ERG
Flag for if the well is located over a sole source aquifer, as defined with the GIS layer available here:http://water.epa.gov/infrastructure/drinkingwater/sourcewater/protection/solesourceaquifer.cfm. "Yes" means the well is over a sole source aquifer.
FEDERAL_FLAG
Yes
Yes
Yes
Yes
ERG
Flag for if the well is located on federal lands, as defined with the GIS layer available here:http://nationalatlas.gov/atlasftp.html?openChapters=chpbound#chpbound. "Yes" means the well is on federal land.
ACTIVE_FLAG
Yes
Yes
Yes
Yes
ERG
Yes/No flag indicating active status based on latest production date.
ACTIVE_PROD_FLAG
Yes
Yes
Yes
Yes
ERG
Yes/No flag indicating whether entity produced liquid and/or gas in 2010-2016, using the production flags.
SHALE_FLAG
Yes
Yes
Yes
Yes
ERG
Yes/No flag for Pennsylvania only indicating if the type of production is shale gas.

