Retrieve EPA ECHO Data in R

EPA’s ECHO website is a useful tool for obtaining wastewater discharge data, in the form of Discharge Monitoring Reports (DMR). These DMRs provide the reported monitoring data from permitted dischargers in the form of effluent quantity and quality. These are useful data for water quality modeling, load allocations, and watershed characterization. However, downloading from the website is tedious at best, requiring clicking through multiple screens one permit at a time.

EPA provides API access to streamline accessing this data. I am extremely inexperienced with APIs and utilizing JSON data formats. Perfect time to learn. https://media.giphy.com/media/elFj7gYCIxVxC/giphy.gif I will say that utilizing httr was painless. Figuring out the heavily nested JSON data used by EPA was (is?) somewhat frustrating since I couldn’t find any documentation on the file structure.

There are three service end points for effluent data I am interested in. For detailed DMR data (monthly average discharge for example), use get_effluent_chart. The code block below shows an example:

library(httr)
library(jsonlite)
library(tibble)

# Make a request
request <- GET("https://ofmpub.epa.gov/echo/eff_rest_services.get_effluent_chart?p_id=tx0119407&parameter_code=50050&start_date=01%2F01%2F2014&end_date=12%2F30%2F2016&output=JSON")

# Get status codes
http_status(request)
## $category
## [1] "Success"
## 
## $reason
## [1] "OK"
## 
## $message
## [1] "Success: (200) OK"
# Access the body of the request
contentJSON <- content(request, as = "text")
## No encoding supplied: defaulting to UTF-8.
# Make it easier to understand
info <- fromJSON(contentJSON, simplifyDataFrame = FALSE, flatten = TRUE)
head(info$Results)
## $Message
## [1] "Success"
## 
## $SourceId
## [1] "TX0119407"
## 
## $EPASystem
## [1] "ICP"
## 
## $RegistryId
## [1] "110009771693"
## 
## $Statute
## [1] "CWA"
## 
## $CWPName
## [1] "SKIDMORE WSC WWTP"

Succesfully downloaded the data! Somewhere…

I had to poke around quite a bit to find the actual DMR in the download data. In short, nested within the Results is a list called PermFeatures of length i where i = number of discharge outfalls at the permitted facility. Within each outfall is another list of lenght n where n = nummber parameters. Finally, nested within this is a DMR associated with the parameter at the outfall for that facility.

For the above plant, I know there is only one outfall and I queried a single parameter. The DMR can be obtained:

DMR <- fromJSON(toJSON(info$Results$PermFeatures[[1]]$Parameters[[1]]$DischargeMonitoringReports),
                simplifyDataFrame = TRUE,
                flatten = TRUE)

This will return a data frame with 37 variables. I’ll knock it down to the info I am interested in:

output <- data_frame("name" = info$Results$CWPName,
                     "stat" = as.character(DMR$StatisticalBaseDesc),
                     "endDate" = lubridate::dmy(DMR$MonitoringPeriodEndDate),
                     "DMRValue" = as.numeric(DMR$DMRValueNmbr),
                     "units" = as.character(DMR$DMRUnitDesc))
head(output)
## # A tibble: 6 x 5
##                name     stat    endDate DMRValue units
##               <chr>    <chr>     <date>    <dbl> <chr>
## 1 SKIDMORE WSC WWTP DAILY AV 2015-08-31   0.0492   MGD
## 2 SKIDMORE WSC WWTP DAILY MX 2015-08-31   0.0534   MGD
## 3 SKIDMORE WSC WWTP DAILY AV 2015-09-30   0.0512   MGD
## 4 SKIDMORE WSC WWTP DAILY MX 2015-09-30   0.0710   MGD
## 5 SKIDMORE WSC WWTP DAILY AV 2015-10-31   0.0480   MGD
## 6 SKIDMORE WSC WWTP DAILY MX 2015-10-31   0.0613   MGD

Note that the DMR provides both daily average for the reporting period and daily max. Be sure to adjust to your needs.

Let wrap all of this into a useable function:

library(httr)
library(tibble)
library(jsonlite)
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.4.3
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
# Functions

# quick and dirty function to add "0" in front of month and days eg. "5" -> "05"
# will be used in the function below
makeDates <- function(x, n) {
  if(nchar(as.character(x)) <n) {
    paste0(c("0"),as.character(x))
  } else {
    paste0(as.character(x))
  }
}



# queries the EPA ECHO API and returns a tibble with DMR info for the wastewater plant
echoGetEffluent <- function(permit, parameter, start, end) {
  
  ## format the dates to paste as characters into the GET string
  start <- lubridate::ymd(start)
  end <- lubridate::ymd(end)
  
  ## use the make date function to format dates into appropriate strings
  startMonth <- makeDates(lubridate::month(start),2)
  
  startDay <- makeDates(lubridate::day(start), 2)
  
  startYear <- as.character(lubridate::year(start))
  
  endMonth <- makeDates(lubridate::month(end), 2)
  
  endDay <- makeDates(lubridate::day(end), 2)
  
  endYear <- as.character(lubridate::year(end))
  
  ## request URL statement
  request <- GET(paste0('https://ofmpub.epa.gov/echo/eff_rest_services.get_effluent_chart?p_id=',
                        permit,'&parameter_code=', parameter,'&start_date=',startMonth,
                        '%2F',startDay,'%2F',startYear,'&end_date=',
                        endMonth,'%2F',endDay,'%2F',endYear,
                        '&output=json'), accept_json())
  print(paste("# Status message:", http_status(request)))
  
  contentJSON <- content(request, as = "text")
  
  info <- fromJSON(contentJSON,simplifyDataFrame = FALSE)
  
  CWPName <- info$Results$CWPName #Grabs the permitted name
  nOutfalls <- seq_along(info$Results$PermFeatures) #grab number of outfall features
  
  flat <- fromJSON(toJSON(info),
                       simplifyDataFrame = FALSE,
                       flatten = TRUE)  
  
  output <- data_frame()
  
  for (i in nOutfalls){
    
    DMR <- fromJSON(toJSON(flat$Results$PermFeatures[[i]]$Parameters[[i]]$DischargeMonitoringReports),
                    simplifyDataFrame = TRUE,
                    flatten = TRUE)
    outfallNumber <- info$Results$PermFeatures[[i]]$PermFeatureNmbr
    
    buildOutput <- data_frame("name" = CWPName,
                              "outfall" = outfallNumber,
                              "limit" = as.character(DMR$LimitValueNmbr),
                              "stat" = as.character(DMR$StatisticalBaseDesc),
                              "endDate" = lubridate::dmy(DMR$MonitoringPeriodEndDate),
                              "DMRValue" = as.numeric(DMR$DMRValueNmbr),
                              "units" = as.character(DMR$DMRUnitDesc),
                              "id" = permit,
                              "param" = parameter)
    output <- add_row(buildOutput)
  }

  output
}

# Give is a try

#set arguments
wwtp <- "tx0119407"
parameter <- "50050" #discharge
start <- "2010-12-01"
end <- "2016-06-30"

df <- echoGetEffluent(wwtp, parameter, start, end)
## [1] "# Status message: Success"          
## [2] "# Status message: OK"               
## [3] "# Status message: Success: (200) OK"
## No encoding supplied: defaulting to UTF-8.
head(df)
## # A tibble: 6 x 9
##                name outfall  limit     stat    endDate DMRValue units
##               <chr>   <chr>  <chr>    <chr>     <date>    <dbl> <chr>
## 1 SKIDMORE WSC WWTP     001   .131 DAILY AV 2015-08-31   0.0492   MGD
## 2 SKIDMORE WSC WWTP     001 list() DAILY MX 2015-08-31   0.0534   MGD
## 3 SKIDMORE WSC WWTP     001   .131 DAILY AV 2015-09-30   0.0512   MGD
## 4 SKIDMORE WSC WWTP     001 list() DAILY MX 2015-09-30   0.0710   MGD
## 5 SKIDMORE WSC WWTP     001   .131 DAILY AV 2015-10-31   0.0480   MGD
## 6 SKIDMORE WSC WWTP     001 list() DAILY MX 2015-10-31   0.0613   MGD
## # ... with 2 more variables: id <chr>, param <chr>