Tutorial 11: Building an Analytics Pipeline with the FRED API

Learning Goals

By the end of this tutorial, you should be able to:

  • Make a simple API request from R using query parameters.
  • Parse a JSON response into a nested R list and inspect its structure.
  • Use rectangling and tidying to convert nested observation records into an analysis-ready tibble.
  • Clean dates, numeric values, and missing-value codes from external data.
  • Store cleaned API data in DuckDB and query it from R.
  • Explain how an analytics pipeline moves data from an external system to reproducible analysis.

The Business Challenge

You work on a small analytics team that prepares economic indicators for regular business reporting. Managers want charts and summary statistics that can be refreshed when new macroeconomic data becomes available.

The data is available from the Federal Reserve Economic Data (FRED) API. The challenge is that the API does not return an analysis-ready table. It returns structured JSON designed for transport between systems.

In this tutorial, you will build a reproducible pipeline that moves data through the main stages of modern analytics work:

API request -> JSON response -> R list -> tidy tibble -> DuckDB table -> analysis -> visualisation

You will start with one FRED series, then adapt the same workflow to a group-assigned series. The important idea is that once a pipeline works for one structured data source, you can reuse the same pattern for many similar sources.

library(httr)
library(jsonlite)
library(tidyverse)
library(DBI)
library(duckdb)
library(scales)

Prepare this Exercise before Class

Complete this short setup task before coming to class.

Exercise 1: Create Your FRED API Key

Before class, create a free FRED API key.

Use the FRED API key page to create your key:

https://fred.stlouisfed.org/docs/api/api_key.html

Bring the key to class. You will use it during the in-class exercises.

Do not paste your real API key into a public document, shared repository, or online forum.

In-Class Exercises

You will discuss these exercises in class with your peers in small groups and with your tutor. These exercises build on the preparation task above, so you will get the most value from class if you have completed it before coming to class.

Exercise 2: Request Data from the FRED API

In this exercise, you will make your first request to the FRED API during class.

(a). Store your API key in an R object. Replace the placeholder text with your own key.

fred_key <- "YOUR_API_KEY"

(b). Your tutor will assign your group one monthly FRED series:

  • unemployment rate;
  • Consumer Price Index;
  • federal funds rate.

Use the FRED website or API documentation to find the exact series ID for your assigned series, then store it in R. The series ID is the short code FRED uses in API requests, not the full series title.

series_id <- "YOUR_SERIES_ID_HERE"

(c). Complete the API request below. You need to fill in the endpoint URL, the series ID object, your API key object, and the requested file type.

response <-
  GET(
    url = "YOUR_ENDPOINT_URL_HERE",
    query = list(
      series_id = YOUR_CODE_HERE,
      api_key = YOUR_CODE_HERE,
      file_type = "YOUR_FILE_TYPE_HERE"
    )
  )

status_code(YOUR_CODE_HERE)
http_type(YOUR_CODE_HERE)

Run the two output checks before moving on. What does each output tell you about whether the request worked?

(d). What does each item in the query list do? If a different group wanted to request its assigned series instead, what would need to change?

Exercise 3: Parse, Inspect, Rectangle, and Tidy the JSON Response

The API response is not an analysis-ready table. It is JSON text that needs to be parsed, reshaped, and tidied.

(a). Convert the response body into text with content(), then parse it into a nested R list with fromJSON(). Keep the nested list structure so you can inspect it before reshaping.

json_text <- YOUR_CODE_HERE

fred_list <- YOUR_CODE_HERE

(b). Inspect the top-level names and the first observation record. Write the code needed to answer the question before you try to make a table.

names(YOUR_CODE_HERE)
names(YOUR_CODE_HERE)
YOUR_CODE_HERE

Where are the repeated observation records stored?

(c). Turn the nested observation records into columns using tibble() and unnest_wider(). Start from the list element that contains the repeated observation records.

fred_raw <-
  YOUR_CODE_HERE

head(YOUR_CODE_HERE)
glimpse(YOUR_CODE_HERE)

(d). The rectangular table still has a generic column called value. Create a tidier table that keeps date and renames value to the variable your group is studying:

  • unrate for the unemployment rate group;
  • cpi for the Consumer Price Index group;
  • fedfunds for the federal funds rate group.
fred_tidy <-
  fred_raw |>
  transmute(
    date = date,
    YOUR_VARIABLE_NAME_HERE = value
  )

head(YOUR_CODE_HERE)
glimpse(YOUR_CODE_HERE)

Why is unrate, cpi, or fedfunds a better column name than value for the measurement you are analysing?

(e). Which columns are now available? Which of those columns still need cleaning before analysis?

Exercise 4: Clean and Store the Data

The tidier table still needs type cleaning before it can be used reliably for analysis.

(a). Create a clean table with a date column and a numeric measurement column for your group series. Use the column name you created in Exercise 3: unrate, cpi, or fedfunds.

fred_clean <-
  fred_tidy |>
  transmute(
    date = YOUR_FUNCTION_HERE(date),
    YOUR_VARIABLE_NAME_HERE =
      na_if(YOUR_VARIABLE_NAME_HERE, "YOUR_MISSING_VALUE_CODE_HERE") |>
      YOUR_FUNCTION_HERE()
  )

head(YOUR_CODE_HERE)
glimpse(YOUR_CODE_HERE)

You will need to use as.Date(), na_if(), and as.numeric(). Why?

(b). Write your cleaned group data to a local DuckDB database. Connect to data/fred.duckdb, write the table as "fred_observations", and list the tables in the database.

dir.create("data", showWarnings = FALSE)

con <-
  YOUR_CODE_HERE

dbWriteTable(
  YOUR_CODE_HERE
)

dbListTables(YOUR_CODE_HERE)

Why is it useful to store cleaned API data in a database?

Exercise 5: Query, Visualise, and Explain the Pipeline

The database is now the source of truth for your analysis.

(a). Use tbl() to refer to your group’s DuckDB table, then calculate summary statistics for observations since 2000. Use your group’s measurement column: unrate, cpi, or fedfunds. Keep the data in DuckDB while filtering and summarising, then bring the smaller result back into R.

fred_db <-
  YOUR_CODE_HERE

fred_summary <-
  YOUR_CODE_HERE

fred_summary

What does collect() do? Why is it useful to delay collection until after filtering and summarising?

(b). Create a time-series plot from 1990 onward for your group series. Your plot should use date on the x-axis, your group’s measurement column on the y-axis, a line geometry, informative labels, and a readable theme. Choose labels that make sense for your assigned series.

fred_plot_data <-
  YOUR_CODE_HERE

YOUR_GGPLOT_CODE_HERE

Describe one pattern you see in the chart. What additional information would help you explain that pattern responsibly?

(c). Write a short paragraph for a manager explaining:

  • where the data came from;
  • why the JSON response was not immediately analysis-ready;
  • how you cleaned and stored the data;
  • what your summary or chart shows;
  • how the pipeline could be reused for another FRED series.