Recent Posts

Make grouping a first-class citizen in data quality checks

Which of these numbers doesn’t belong? -1, 0, 1, NA. You can’t judge data quality without data context, so our tools should enable as much context as possible.

Why machine learning hates vegetables

A personal encounter with ‘intelligent’ data products gone wrong

A lightweight data validation ecosystem with R, GitHub, and Slack

A right-sized solution to automated data monitoring, alerting, and reporting using R (pointblank, projmgr), GitHub (Actions, Pages, issues), and Slack

Workflows for querying databases via R

Simple, self-contained, reproducible examples are a common part of good software documentation. However, in the spirit of brevity, these examples often do not demonstrate the most sustainable or flexible workflows for integrating software tools into large projects.

Understanding the data (error) generating processes for data validation

A data consumer’s guide to validating data based on the failure modes data producer’s try to avoid

Talks

Column Names as Contracts

Exploring the benefits of using controlled vocabularies to encode metadata in column names, and demonstrations of implementing this approach with the convo R package or dbt extensions of SQL.

oRganization: Design patterns for internal packages

An overview of the unique design challenges and opportunities when building R packages for use inside of a single organization versus open-source. By using the jobs-to-be-done framework, this talk explores how internal packages can be better teammates by following specific design patterns for API design, testing, documentaiton, and more.

projmgr: Managing the human dependencies of your project

A lightning talk on key features of the projmgr package

RMarkdown Driven Development

How and why to refactor one time analyses in RMarkdown into sustainable data products

tidycf: Turning analysis on its head by turning cashflows on their side

An overview of how the tidycf R package led to process and cultural change at Capital One

Projects

*

dbtplyr

dbt package bringing dplyr semantics to SQL

convo

R package for managing controlled vocabularies

satRday Chicago Conference Organizer

Speaker & Sponsor lead for 2019 and 2020

Rtistic

Hackathon-in-a-box templates for custom Rmd and ggplot2 themes

projmgr

R package providing project management interface to GitHub

Publications

97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

Contributed six chapters on tops ranging from data design, development, validation, and democratization

R Markdown Cookbook

This cookbook contains tips and tricks to help you get the most out of R Markdown. Topics include the automated generation of content (diagrams, text), customizing format (Pandoc, HTML, and LaTeX templates), workflow improvements (modularizing child documents, cross-referencing code chunks, chunk caching), modifying rendering behavior with hooks, and using alternative language engines.