Skip to content Skip to navigation

Data Mountaineering

Jul 26 2017

Imagine this mountain is a mountain of data. Photo by Jake Young
 

By Max Vilgalys
B.S. Electrical Engineering, 2017
Western Interstate Energy Board

Out West Student Blog

Student Blog

Fun fact: since 2001, coal units in the West have made over twelve million hourly reports about generation and emissions to the EPA. This summer, I have become intimately familiar with that data. Along with my co-intern Ben Lim at the Western Interstate Energy Board in Denver, it's my job to parse through this mountain of information to find something meaningful, something that can inform policymakers about the state of the grid and help shape decisions about energy in the West.

Some of the challenges are more or less what I expected. Putting together the information from the EPA's website into a format we can use took the entire first weeks I was on the job. And even once it’s formatted and condensed and filtered, it takes a long time to process that much information. Sometimes I'll wait for hours for a program to run, only to find out that I made a mistake and need to start over. The files are so big we had to try three different devices to find something that lets us transfer them between computers. Figuring out how to extract the information we want is sometimes tricky, and fine-tuning the algorithms to make sense of it all has been no cakewalk. But the hardest parts of the project have been the parts I didn’t anticipate – figuring out what questions to ask, and how to show our answers to the world.

Figuring out how to extract the information we want is sometimes tricky, and fine-tuning the algorithms to make sense of it all has been no cakewalk. But the hardest parts of the project have been the parts I didn’t anticipate – figuring out what questions to ask, and how to show our answers to the world.

On some level, I appreciate the open-ended nature of our project. We were given this database and asked to say something about the state of coal generation in the West, using the numbers to show how the coal fleet has been transitioning from constant, baseload operation to more flexible, load-following generations. From there, it was up to us to generate some more specific questions we could actually answer. Some questions are easier than others. Can we find any plants that are operating with a specific pattern? Yes, we can write a script to look through all the plants for a match. Can we classify how much of the time plants are in baseload operation? Yes, we can use machine learning to classify days based on their operating patterns. Can we show what caused these changes? Can we explain why some have changed more than others? Maybe, but there isn’t as clear an answer, and we’ll probably spend the rest of the summer looking for one.

Number of Units In Baseload Operation a Given Percentage of the Time, 2001-2016
An interesting, but admittedly very confusing, graph. Clear communication is perhaps our largest obstacle.  (Max Vilgalys)
 

Much like with a real mountain, it’s one thing to climb up by yourself. It’s a struggle and a beautiful experience, but it means so much more if you can lead someone else up to see what you’ve seen. Once we finish climbing through this analysis, we’re going to need to share it with a group of policymakers and researchers. It’s amazing how a simple graph like “Number of Units In Baseload Operation a Given Percentage of the Time, 2001-2016” can stupefy people who aren’t intimately familiar with our data. When you’re delving deep into understanding something, it can be challenging to make sure it’s accessible to anyone on the outside. But thanks to the input from our mentor and the other employees, we’re getting better at explaining our results. It’s fun and satisfying to do the analysis, but by far the most important part of this project is presenting it clearly at the end. The view up here is great, and we can’t wait to share it.

Read more at the Out West Student Blog »