TOM ADAMSON:
When we go to a lake in the summer, we like to see clear, clean water. Lakes, of course, have varying water quality and lots of factors affect that quality. Today on Eyes on Earth, we’re going to find out how researchers took an old data technique in freshwater science—physically sampling lake water—and reinvented it using satellite technology to study water quality in lakes across the U.S. 

Hello everyone and welcome to another episode of Eyes on Earth, a podcast produced at the USGS EROS Center. Our podcast focuses on our ever-changing planet and on the people here at EROS and across the globe who use remote sensing to monitor and study the health of Earth. My name is Tom Adamson. 

In this episode we’re talking with Dr. Michael Meyer, a Mendenhall Fellow and Research Geographer at the USGS. We’re talking to Michael about a recently released dataset about something called Lake Trophic State. We’ll get to what that is in a minute. First, Michael, could you just tell us a bit about your background?

MICHAEL MEYER:
Yeah, sure thing, Tom. So I define myself as a data intensive freshwater scientist and sometimes a data intensive interdisciplinary freshwater scientist. What I mean by that is I really come from an aquatic ecology background where I spent a lot of time working on food webs in lake environments, especially how food webs change to various human disturbances and also how food webs and lake ecosystems vary across large spatial and temporal scales. And I really bring that background into remote sensing science trying to figure out how not only are we remotely sensing aquatic ecosystem integrity, but also how those ecosystems are functioning through an ecology lens.

ADAMSON:
Now it’s great that you brought up remote sensing and of course here at EROS, we’re really concerned with the Landsat series of satellites, and Landsat data we typically use to study land. You know, it’s called Landsat. So how exactly are satellite images from Landsat and Landsat pixels of water helpful in determining the health of lakes?

MEYER:
Totally, we get that question a lot, and really it goes back to just trying to understand the color of water. I think color is one of those sort of overly simplified, but also really intuitive metrics of how to understand ecosystem processes like those, those processes I was talking about. Where a green lake, you know, just by based off the color like green or blue or brown, a freshwater scientist can actually intuit a lot of information about that ecosystem. So thanks to Landsat, we’re able to at least get some of those pixels that you were talking about over lakes and rivers, and just from the color we can understand what’s like the water quality of that ecosystem.

ADAMSON:
OK. So is it sort of like a visible color? Like, here, this pixel is green, something like that?

MEYER:
Yeah, it’s kind of like that. Looking at the relative amount of green or blue or brown or near-infrared within a pixel, we can understand like what sort of water quality that might be connected to.

ADAMSON:
So you mentioned blue, green, and brown for colors that you’re looking at in these pixels of water. So what would it mean if you see blue, green, and brown in the lake?

MEYER:
Yeah, so a blue lake, we generally call some something like oligotrophic. What oligotrophic broadly means is just low productivity. There’s not a lot of algae. There’s not a lot of critters in that lake necessarily. You can think of this being a bright blue lake. But like I said, how freshwater scientists connect this to a large body of knowledge, we also know that oligotrophic lakes tend to be high in oxygen. They’re low in nutrients, tend to be low pollution, whereas eutrophic lakes are those green lakes. Those tend to be the ecosystems where we get high algal mass. We get high productivity, lots of fish production usually, but they can also get have like pretty big swings in water quality. Those are the lakes where you might expect an algal bloom to occur, maybe in the summer if there’s a really big algal bloom, the oxygen will go very low and so you’ll see fish kills. So those are the systems that you know, we tend to monitor pretty intensively. And then the brown is sort of it’s pretty, it’s a pretty uncommon type of lake. Sometimes they’re thought of as bogs, but they’re called dystrophic lakes. You can think of them maybe not as brown, but like the color of a strong tea. There hasn’t been a lot of study, or recently there’s an uptick in study, but historically there hasn’t been a lot of study in these types of lakes because they are pretty uncommon. And what we know about them is that the heat, they tend to warm up really intensively in the top surface layers and it tends to be pretty cool at lower layers. They tend to be very low algal production, but very high microbial production. So these are the types of lakes that might go from high oxygen to low oxygen really quickly. They also tend to be high in dissolved organic carbon, which can be hard for treating if you’re using that lake for drinking water availability. So yeah, this is a long walk to get to the short drink of: when we know the color of a lake, we can connect to that to what we call trophic states, and freshwater scientists know some generalized principles of how each of those trophic states function as an ecosystem.

ADAMSON:
OK, it’s a nice way to quickly describe what kind of lake you’re dealing with. 

MEYER:
Exactly. 

ADAMSON:
OK, so I’ve learned a few new words here today. Trophic—what does that refer to?

MEYER:
So trophic broadly refers to some type of, if we’re speaking like in a literal sense, trophic broadly refers to nutrition of some sort, but you can think of it more of like productivity, like how productive is a lake in terms of its algae, and anything that’s not algae that’s living and growing in the lake.

ADAMSON:
And then you were using words like dystrophic, which means lack of nutrients. 

MEYER:
Yep.

ADAMSON:
OK. Eutrophic—that was the one that meant lots of nutrients?

MEYER:
Lots of nutrients, lots of algae.

ADAMSON:
OK. And then what does oligotrophic mean?

MEYER:
That would mean like few nutrients or few productivity, a little productivity. 

ADAMSON:
Just a little productivity. OK, so now I’m going to try to connect these to the colors. We’ve got, brown is dystrophic, green is eutrophic, and blue is oligotrophic.

MEYER:
That’s right.

ADAMSON:
OK, I think we got that figured out. What are you looking for with this data? How does this connect to you helping to monitor the health of lakes?

MEYER:
Yeah. So, the broad picture is really if we can identify which lakes are of what trophic state. And perhaps more importantly, how that trophic state is changing at large spatial and temporal scales, we can sort of synthesize how water quality across landscapes is changing. And so we can do that by using this really simple metric of color and these sort of generalized principles that come with that color and that trophic state connection to understand how ecosystems are functioning at the continental scales.

ADAMSON:
There is a dataset that you worked on creating which you recently released. And as I understand it, that dataset considered 55,662 lakes across the entire lower 48 United States. So it’s not like you can scoop out a cup of water from each of those lakes, bring it into the lab, and see what the water quality is.

MEYER:
Well, believe it or not, Tom, that’s kind of how it starts. 

ADAMSON:
Oh, yeah? 

MEYER:
But not all 55,000. Yeah, it really starts with a couple thousand.

ADAMSON:
OK. Like a subset of the entire number of lakes, OK.

MEYER:
Exactly, exactly.

ADAMSON:
I’m hoping we get to how satellites help with this, of course.

MEYER:
We’ll get there. We’ll get there. But yeah, the, the way that it broadly goes is we use this dataset from the U.S. Environmental Protection Agency. It’s called the National Lakes Assessment. And effectively it is exactly what you said. It’s every five years, crews of people go out to thousands of about 1 1/2 thousand lakes across the contiguous U.S., and they sample for a wild array of constituents. And the beauty of this sampling campaign, it’s twofold. All samples have the exact same methods. They have the exact same sampling procedure, so samples are largely taken in the same relative position in every lake. And lakes that are included in the campaign are picked in a really smart statistical design. And without going too much into the details, those 1400 or approximately 1400 lakes are picked so that we can look at that subset of 1400 and say these lakes are representative of the larger array of lakes within the nation. The wonderful thing about this dataset is that because all of the samples are comparable to one another, we can take it and we know that there’s relatively consistent methods and we can apply these trophic state definitions that we were talking about. So we can use this dataset to ground truth what lakes we know to be eutrophic, oligotrophic, and dystrophic. So this is where Landsat comes in. We can take those Landsat images for when field crews were sampling lakes in the National Lake Assessment and merge those surface reflectances or those lake color data with the in situ samples and build classification models, these statistical models that ultimately give us the math to say, oh, this lake is oligotrophic or this lake is dystrophic or eutrophic. And then for other lakes, so the 55,000 in the dataset, we might not have or we don’t have data from the field to tell us what that trophic state is, but luckily we can use those statistical models that we developed for lakes that we know and apply them for lakes that we don’t know. And that’s how we get to the 55,000.

ADAMSON:
OK, that sounds good. And a little more efficient than going to that many lakes. 

MEYER:
Very true. 

ADAMSON:
Was there any other validation done?

MEYER:
Oh, there was a lot of validation done. A lot of it was very analytical. So we took it from like, what are the lakes that we— Can we identify times when the models get a lake classified incorrectly? So what we did, for example, is we took things that could cause a lake to be misclassified. So for example, oligotrophic lakes, those blue lakes, right. They tend to be characterized by like high benthic algae or algae that grows on the sediment. And you can imagine that a Landsat image as it’s overpassing the lake, if it’s shallow, it could pick up some green signature from that benthic algae. And so what we would say is like a shallow oligotrophic lake would tend to be classified as eutrophic, even though we know it’s oligotrophic. And so indeed, you know, shallower lakes tended to have a higher misclassification rate of being green or eutrophic. So we did a bunch of different analyses to see like what are the characteristics of a lake that might cause it to be to cause it to be misclassified. We also did a lot of manual investigation. We took the data that the models told us of which lake should be of which trophic state, and we manually went into management reports, just doing Google searches and Google Scholar searches to see how often did the models get a lake incorrectly or correctly classified. And were there any spatial patterns, right, where most of our misclassifications in one part of the country, or were they diffuse throughout the nation? And, you know, thankfully, they’re pretty diffuse throughout the whole nation and that gave us some more confidence that it’s not a regional driver or something regionally that could be messing with the models in some way shape or form. And then the last part of the validation, which personally I think is one of the most satisfying analyses I’ve ever done to date, was, so I should have said this, but the models only know a lake’s red, green, blue, and near-infrared reflectances.

ADAMSON:
Right, those are the wavelengths that Landsat can record. So that’s why you’re picking on those.

MEYER:
Yes. So the models only know that, for when we expand to the 55,000. They don’t know anything about where a lake is located. They don’t know anything about its morphometry. They don’t know anything about the year that those Landsat images came from. And we’re able to get the trophic state correct about 75% of the time, which for ecological data is pretty good.

And what we’re able to do with just that information, if you take all of those predicted trophic states that are based solely off Landsat reflectances, we are able to reproduce the exact same water quality trends that the U.S. EPA National Lakes Assessment observed when they had in situ water chemistry data. What that means is, is that we’re able to sort of validate the results of the National Lakes Assessment and ultimately take those insights from a few thousand lakes up to, or a few thousand lakes is a lot—but from about 1,500 or 2,000 lakes all the way up to 55,000 lakes. And at the same time, they’re able to help us. Because we know how water quality is changing for a subset of those water quality parameters, we can ultimately tap into how the changes in oligotrophic, eutrophic, and dystrophic lakes also influence other water quality parameters that are included within the National Lakes Assessment. And so just by the fact that two independent methods are giving the same qualitative result, that was when I was totally convinced that we were really on to something with this dataset.

ADAMSON:
This is an accurate dataset and you’ve done the legwork to make sure of that.

MEYER:
We really have.

ADAMSON:
Now when a sample is taken from each lake, whether it’s the in place, physical sample or the remotely sensed pixel, what part of the lake does that come from?

MEYER:
Yeah, that’s actually a really good question. Or that is a really good question, not actually. So, for the Landsat reflectances, what we do is we perform this calculation called identifying the Chebyshev Center. OK, so if you imagine a lake like a shape of a lake, you could imagine how if that lake is really oddly shaped, like it’s like a horseshoe shape. The center of that lake might actually be land. It might not be water. So if you took the centroid, that doesn’t really tell you much. What the Chebyshev Center is, is you can imagine if you were to find a point and then draw the largest circle, you could from that point, the largest radius you could from that point until you hit land. That’s how we identify the quote center of the lake. So the point that’s farthest, or sometimes it’s called the point of inaccessibility. The most inaccessible part of the lake.

ADAMSON:
Oh, OK, well most of the time it’s not terribly inaccessible if you’re in a smallish lake, but that’s the way they put it anyway. It sounds like a good, consistent metric to use at least.

MEYER:
Exactly. And thankfully, the National Lakes Assessment actually uses a pretty similar thing for their sampling. You know, for theirs, it’s more constrained, right, because there’s people in a boat that have to go out and get the sample. But what they do is they make sure that it’s offshore so that there’s comparably less interference from the near shore, and it’s at a consistent depth, where if it’s a really deep lake, they’ll go all the way down to like 50 meters, if possible. But otherwise, it’s pretty representative of the surface layer. And that’s really what also makes this what makes this a really powerful merger. We’re having samples that were collected in a certain way, or prescribed way, to make all the lakes comparable, and we’re also merging that satellite data in a way that really reflects how the samples were collected in the field.

ADAMSON:
Now, of course, what Landsat is really useful with is change over time, and this dataset that you just released goes back to 1984. Can you tell us why that year was chosen?

MEYER:
Yeah, it’s really a product of how many overpasses you reliably have in a year, and it’s really starting in 1984 where we start to get at least three overpasses a summer within a year. And that’s the time where we would anticipate brown, green, and blue colors or associations to be most apparent.

ADAMSON:
OK, in the summertime, got it. How can someone find this dataset? Is it available somewhere?

MEYER:
Yes it is. It’s available entirely on the Environmental Data Initiative, or EDI, and it’s freely accessible. What we tried to do or what we did do is adhere to what is called FAIR data principles, and FAIR basically means findable, accessible, interoperable, and reusable, or reproducible. And what that generally means in like adhering to FAIR data principles, number 1, the data are findable, so you can get them on the Environmental Data Initiative at absolutely no cost. They’re accessible. You don’t even need to make like a login account or a user account to download them. The data are interoperable. You can efficiently merge the data with similar datasets that use a similar coding structure. So what I mean by that is that each lake in the dataset has a unique identifier, just like 1, 2—it’s not 55,000, but you can imagine each lake having a consistent number that’s attached to it. Other lake datasets, so that are based off like, or that use this same numbering scheme or identification scheme, you can merge them together very— It takes one line of code to merge them together, and that allows ultimately you to use these data in ways that like we as the creators of the dataset never even intended. You can look at how long residence time is with respect to trophic state or surface area expansion, and it’s really that unique identifier or that interoperability that really opens up the door to the potential use for this dataset. And lastly, that R in FAIR, the reproducibility or reusability, really stems back to how we made the dataset. There’s a lot of computer coding and scripting that went into making this and we did our best so that you know this dataset can be updated as new data from the National Lakes Assessment, so as they have more field campaigns, get added or as new Landsat imagery become available. And we even kind of future proofed the dataset by putting it inside of what’s called a container, and basically what that means is as computers change through time, as software has changed through time, this code should be able to run even though you know software gets updated or maybe some software that isn’t supported in the future. It’s a way of future proofing the data production. Hopefully we did it well enough to future proof it, but all that went into how we went about making the dataset and it’s something personally I’m really proud of.

ADAMSON:
Is there anything else that you’d like to add about this new dataset?

MEYER:
The only thing that, I mean the big thing that I’d like to add is, you know, we’re really excited to see all of the potential uses that could stem from this dataset. You know, I believe it’s one of the first to take this really old idea in freshwater science, the trophic state idea, which stems back to the 1880s, and totally merge it and reinvent it a little bit with remote sensing technology. That’s really thanks to the exceptional time series and spatial extent that Landsat provides. And so we’re really excited to see how people use this dataset in ways that we never anticipated. For example, I know that a few people at Oak Ridge National Labs are using it for understanding methane emissions from lakes. I’m really interested in understanding lake productivity across the national scale. It’s a really exciting time to be engaging in this science.

ADAMSON:
I’d like to thank Dr. Michael Meyer for joining us on this episode of Eyes on Earth, where we talked about researchers using Landsat to measure lake trophic state across the U.S. And thank you listeners. Check out our social media accounts to watch for our newest episodes. You can also subscribe to us on Apple and YouTube podcasts. 

VARIOUS VOICES:
This podcast, this podcast, this podcast, this podcast, this podcast is a product of the U.S. Geological Survey, Department of Interior.