Welcome to the Eros User Experience webinar series, where we talk to staff at EROS to learn more about the data, tools and services coming out of the USGS Earth Resources Observation and Science, or EROS Center. Today's webinar is entitled An Introduction to Landsat Data Access and Processing in the cloud. I'm your host, Danielle Golon. The remote sensing user services lead here at EROS in Sioux Falls, South Dakota. The time is currently 12 p.m. central, so we'll go ahead and get started. First, a few logistics to ensure the best audio experience. All participants have been muted. If you have any questions or comments during the webinar, please add them in the chat and we will address them at the end of the webinar. If the chat does not work for you, please feel free to email your questions to custserv@usgs.gov and we will answer them there. Today's webinar is being recorded. The recording will be available later on the USGS Landsat website, as well as the USGS trainings, YouTube channel and the USGS Media Gallery. At the start and end of the webinar, we will have a few polling questions. These polls are optional, but your answers can help us create a better user experience in the future. The polling questions will be available via the polls feature in teams or in the teams chat for our audience members who are not able to use the teams post feature due to their organization settings. If the post features do not work for you, please feel free to respond to the polling questions in the chat instead. The questions are the same, so please either use the polls feature or the chat. Whichever option works best for you and I will go ahead and launch our polls now while we finish the introduction to the webinar. You should now be able to see the first set of polling questions, both in that post feature or in the chat, so feel free to fill those out at your leisure. Today's webinar will consist of a presentation, several live demonstrations, and then a question and answer session at the end. Today's speaker is Tonian Robinson, a geospatial cloud support scientist with the USGS, EROS user services team and the annual National Land Cover Database, or NLC team here at the USGS EROS, a graduate of the University of South Florida and Tampa, Florida, Tonian has a PhD in geology focusing on geophysics. Tonian has worked as a contractor at EROS since the summer of 2023. Tonian presentation will provide an overview on Landsat data in the cloud, as well as demonstrations of several scripts Tonian and the team have written on accessing and working with Landsat data in the cloud. Once Tonian has finished your presentation, we will then transition over to an optional set of final polling questions, and then we'll move on to that Q&A portion of the webinar. We have several Aero staff members from user services, the internal Landsat science team, and our Access and Archive cloud developers here at EROS on the line to help answer any questions you may have. After Tonian has finished your presentation again, please feel free to add your questions or feedback throughout the webinar using the webinar chat. We'll try to answer all of our questions within the time allotted, but if we're not able to address your question during the Q&A portion, we will follow up with you offline. If there's a future webinar topic you'd like us to cover, please feel free to suggest that in the chat as well. With that, it's my pleasure to introduce today's speaker, Tonian Robinson. Take it away Tonian hello everyone. I am Tonian Robinson and welcome to this introduction to the materials that we've created in the past year to help you guys get started with working with Landsat data in the cloud. To start a brief outline, first, I'm going to discuss how Landsat data is stored in the cloud. I'm going to fully demo one tutorial and walk through the HTML format of two other tutorials. And then I'm going to highlight some valuable resources to help you guys get started with working in the cloud. To start, yes, Landsat data is available in the cloud. It's available in the Amazon Web Services S3 bucket located in the Oregon, US West two region. Users will need to specify direct requester pays when they're accessing this data. So what is available in the cloud. So currently we have level one radiance products. And these are the geometric and radiometric corrected products. Level two which are currently the atmospheric corrected products. And they're the surface reflectance and the surface temperature products. Then we have the ARD analysis for analysis ready products that are available in for a level three, the burned area, the fractional store area and the dynamic surface water products are available. So how is data stored in the cloud? So here's an example with a level two product that was collected with the Thematic Mapper in 2011. And this is the product's full name. Where is it stored? First, it's stored in the S3 bucket, the USGS Landsat S3 bucket. And this is a surface reflectance data product. Right away it's stored first in the USGS. S3 bucket is stored under its collection number, which is two, and that's highlighted in the name. It's then stored in its level number, which is level two. Since the level two product, it's then stored under projection, which is a standard projection and that's not highlighted in the name. And the standard production in this case is the W GSA, T4, UTM projection that is used. It's then stored under its sensor. In this case it's a thematic sensor which is labeled as term here, but labeled as T, and its name is then stored under its year, which is the year it's collected and that is highlighted in its name. And then it's stored under its path and row, which are also highlighted in its name. And lastly, it will be under its entire product name. So under this product structure, there are various objects ranging from the metadata to band products that are downloadable for the specific product. Now that you know this data is stored, now let's walk through how to find these data. And first step into finding the data is understanding how the metadata is structured. So Landsat uses the STAC which is a spatial temporal asset catalog. And it's just a family of specifications that standardizes geospatial metadata. So it makes it easier for you to access the metadata related to geospatial products. So we have a Landsat STAC catalog and the Landsat STAC catalog includes various collections. And these collections are groups of Landsat products. For example, a Landsat STAC collection is the surface reflectance collection, for example, which is separate from the surface temperature collection. There are about 14 Landsat STAC collections, and within each collection, their individual Landsat data products, and those are referred to as STAC items. And lastly, within each item there are various assets, and these range from the thumbnail, the metadata, and the various links to download the data that is available per the item. We have various ways for you guys to interact with the STAC. The first one here is the Landsat STAC browser, which is available through the USGS, is really a way to just click through based on the scene, the tile or a level three product just to find a product to see what is available with this product, what metadata is associated with this product. And this is the non programmatic way of interacting with the STAC. And here's an example with a burnt area product. This single product upon clicking through to find this product in the STAC browser you will see the outline of this product. You will also see the metadata to the right, and there are tabs on top that can click through to see the assets and the various bands and the thumbnail. The second way to non programmatically interact with the STAC is through the STAC index. STAC index orders all of the Landsat STAC collections. First, it starts with the collection names listed with their title. There are 14 of them. You can click through any depending on the product you want. You can click through to find individual scenes and see where they are located on the map, and you can see the various data associated with them. Similarly, this is a example using analysis already data set. Here is the outline of the scene and also the metadata is located on the right. You can scroll through to see what is associated with this data. So programmatically now in the USGS basically the user services we provide scripts Python notebooks to show you how to work in the cloud with Landsat data. And they're housed in the USGS jet lab that is just highlighted here. I'll be going to the Jet Lab after this brief introduction to it. What is available in the jet? We have four projects related to the cloud. First one is about accessing data, which is how to search and pull down data. The second is processing which is anything from calculating indices to filtering. And also visualizing is a lot to visualize. And of course there are quick guides which are short guides of how to do something like create a GeoJSON file. And there are case studies which are newer, and their real world examples of how to process Landsat in the cloud. From here, I'll be going to the demo first. I briefly mentioned the three tutorials that I'll be walking through. The first tutorial is just an introduction. To STAC this one, I'll actually be downloading it and running it live. The second one, this is just related to decoding the pixel mask and using it for masking. I will not be downloading this, but I'll be walking through it's HTML file and the third one is related to pulling down a single pixel through time and filtering it and also calculating indices. So with that, I'm just going to head over to the Jet Lab and introduce you guys to the Jet Lab. So here is the USGS get lab. I am not signed in so that you can see what is readily available to you as a user. As you can see, there are five projects I discussed. The ones that are related to cloud. However, we have a machine to machine project available which it just focuses on pulling down data using the machine to machine API since it's not cloud. I did not discuss this, but we have three machine to machine projects that are tutorials that are available in every cloud tutorial we currently have about six tutorials in access, in all related to interacting with the STAC API to pull data down. In case studies, we currently have one which is related to compositing in the ukulele region in Peru, in processing in the cloud. There about seven and more pending to be uploaded. Various ways to process and filter the data and in quick guides currently have two guides related to creating things which are really great source in searching or interacting with the STAC to find use agents to search and find data. So before I actually download the first tutorial, which I'm going to demo, I'm just going to go through and show you, for example, how the tutorials are set up in the lab. So here's an example with the recent Creating Composites in Landsat data tutorial. In every tutorial there is the notebook, the HTML file of the notebook that is run. There is an axis y ml file, and that is just the file that you use to create your Python environment. Sometimes there are data files with just images. I'll put it and util folders which include the GeoJSON that is used to for the area. The Readme file is presented below the list of the files, and it's really just an introduction to the tutorial itself, which introduces the tutorial of compositing where it's located and for every processing there are prerequisites that is listed and a table of contents so you can see what's in the tutorial. And lastly, and importantly, we also have how to set up the Python environment using the email file that is given to you as a user. This is what is provided with the tutorials. So I'm going to head back out to the main branch. And I'm going to go find that role that I'll be demonstrating. And it's the introduction to the Landsat STAC. Which is great starting point if you're interested. So I'm going to first copy the https link for this to pull it down using git. So I'm in an empty directory that I created for this webinar. And I'm just going to activate that environment with it's already created. But I'm just going to activate it to get started. Access in Landsat data. And then I'm going to say git clone. And I'm pasting that file for the tutorial, the link to the URL. And I'm pulling it down into this directory that I have open for this webinar. To open the tutorial I'm going to open Jupyter Notebook. All the tutorials are Jupyter notebooks to refer to with Python notebooks. I'm going to open the tutorial and there is everything associated with this one. This one comes with a HTML file that shows before running and one or after, and of course a .yml file if it's necessary in this case. So I'm going to open the notebook and we're going to just start running through this tutorial. So I so again this is a basic introduction to STAC the Landsat STAC. Specifically STAC is a standard for grouping geospatial metadata. And we use STAC here at USGS for Landsat, especially if you're trying to get Landsat data from the cloud. So that is the introduction to STAC. And yes, the top is covered. What is STAC importing packages and then interacting with the API. And also showing you how to search for data. Again we have Landsat has a STAC API, which is just a link that you interact with when you're programing the Lantern STAC catalog. And I wanted to emphasize this that I do use the term collections. Landsat STAC collections are groupings of Landsat data, while Landsat collection two. It's really a process in of the entire Landsat archive. And we are currently Atlantic collection to in terms of archival processing. But Landsat STAC collection, when I refer to collections moving forward, I'm referring to just groupings of different mindset data sets. So we have the Landsat STAC backlog. And within the catalog there collections are groupings of data sets, and within each collection there is a STAC item which is the scene itself. Multiple Landsat scenes. To start, we're going to only import a single tool here, which is just requests. To interact with the STAC API, you do not need Amazon Web Services account set up. It's really just making sure you have the whatever program you're using or module you're using to pull the data. Here I'm just pulling down using the request function to interact with this Landsat STAC server and pulling down the response. This is the various links and the versions and the title of the API that are pulled down. So really just Json just the various links that are attached to the Landsat STAC. And here again we're printing some of the more useful links. Some of these, if you click them, you will find the various ways that you can search in the Landsat STAC. However, the tutorials kind of highlight more clearly how you can search, but they're available within all the links that are available in the main sets that collect catalog. Here I'm just printing the information, useful information related to it, the version, the ID and the type. It's a catalog and there are 22 links associated with it. Next I'm running this cell and it's printing all the children and various information related to the links, the Landsat 14 collections that are available. The STAC collections are listed as children in the catalog. Child items. So for the multiple STAC collections there are 14. So this. So let's count how many collections are available. And there are 14 Landsat groupings of products. And again that is the burn area versus the surface reflectance versus level one. They're all grouped separately. And here we're just listing all the collections and the descriptions for all the collections. All 14 of them. And here we're actually looking at one of the collections. The second collection, this Python nomenclature is really one is actually a second. So with the second collection is the surface temperature. And within the metadata for this entire collection you have the bounding box, the license, the keywords are actually the platforms that are in use to collect data for this collection. And there are various links just to endpoint specifically to this collection that are available. So within each STAC collection there are multiple items. And these are called STAC items. And these are single data products. So a single scene collected in a specific time wherever it's been collected. And here's a single product. And it comes with a lot of metadata comes with its ID, of course, its bounding box, its geometry properties. And if I scroll further there assets. These are broken down further in the tutorial. So I'll just move on to those because it explains so within this single scene, which is a surface temperature scene. Since we had that collection selected, you have the description of it, its ID, its acquisition date, the platform that is used to collected cloud cover, and a number of assets which are the bands, the metadata and even the thumbnails. And here in this cell, I'll be just printing out the various assets that are associated with this single scene and URLs. So there are multiple URLs attached to the products. So you can have the Landsat look URL. You can individually click the Landsat URLs to download products if you'd like it that way. But however, STAC is best used if you're interacting with the cloud or pulling data to the cloud because every asset has a S3 link where you can use Amazon or a rest area or a Python library to pull it down. But they do all have links that you can click if you're interested in doing that, and I won't click them, but they're there and they work. Next up in section three we're just setting up a search. And this is really just creating a parameter dictionary to interact with the Landsat STAC endpoint. So first step here they're just pulling the link from the Landsat STAC endpoint link from the catalog that was pulled down. And now and also creating a empty parameter dictionary to start. Set up. This creates an empty parameter dictionary here in this section. It's really just harmless is that we're going to search and are available for you to search. But there are more that are available to search. And I will show you those when we move deeper into this. It's where all the limits were first set in a limit. And you can set a limit of up to 10,000. But here is 400. So every item comes with a bounding box. And here's an example. If you know the bounding or if you want to use a bounding box of an item, you could use it by specifying it. But here we're seeing a bounding box in a different area for our search, just to not pull in the same item that we were using or looking at. So we added a bounding box here to the parameter dictionary, along with the limit. And now we're doing a search. And what is expected from this search is 400 to be returned because there's a limit of 400. And we expect more than 400 data sets to interact. When you're only just specifying a bounding box, the next step is adding a temporal query. And this is really just what they call a ISO string DateTime object you include into parameter dictionary. I'm going to run that to add it and see it's updated with the date time to the dictionary, and then I'm going to run it. So now we're limiting the search to a specific date range. And now it's 144 products that are returned. And reminder view we can specify collections. And I'm going to be going heading to that. So in section three you can specify a collection. And in this case you are specifying the two collection the surface reflectance and the surface temperature. To further limit or strictly search. So we're adding two collections to it to the parameter dictionary. And here in 20 I'm going to run with the collections added. And now we reduce from 144 to 28. Since only a few of the products were actually from the surface effect and surface temperature collections. Next again, we're printing the single item, and the first item that is returned is a surface temperature product. And again it has all metadata associated with the product the geometry, the ID, the properties, and the assets. So in this session line of code, I'm going to run this to show you guys the various properties associated with this single item. And this is important because these properties are what are searchable for this item. So everything from the date, time, the cloud cover, the scene ID and even the projection shape are searchable using STAC for this item. And you search properties by creating a query or adding a query to the parameter dictionary and in this session that I just ran, we're adding a cloud cover range from 0 to 60% and restricting it to a platforms only from Landsat eight and nine. And I'm going to run to see the results. And the results show only 12 returns for that specific date range, bounding box collection and cloud range and platforms. So here in this very last I'm just going to run to see the show you the results. So after all those specifications, we only have data from Landsat eight runs at nine. They only have cloud cover below 60%. And they're all from the surface temperature. And the surface reflectance data sets. Thank you to Holly for creating this introduction. To STAC the next two tutorials I'm going to walk through are the HTML formats of them. And they're related to processing Landsat data in the cloud. So to start I'll be heading to the decoding the pixel Landsat pixel and using it for masking to draw. So the pixel sharp band, it's really a quality layer that highlights the potential issues with the pixels that you're working with. For any Landsat scene, it's in 16 unsigned bit, cloud optimized view of Tiff, and it's downloadable through to S3 bucket, as is an asset with all the products, the pixel shading can mask out clouds, no shadow water, and other potential issues with your pixel. So in this use case example, there's not much of a story. It's really just here is Vancouver. I know it has no one. I say selected the scene because the main focus is how to process the pixel QA band and use it with Python. This is the before and after result, which will also be shown in the end. So to start we have the various prerequisites that you can use. If you don't understand something, the information will be found in the prerequisite tutorials. We then import the modules. More Python models in this case because we're actually pulling data into memory. So the great thing about working with data in the cloud is that you're pulling data into memory. You are not downloading them directly to your device. Your computer. So the first step in many of the tutorials is creating the functions to interact with the Landsat STAC server. And this one is just a STAC server that I've been using. There are other tutorials that use the famous Pi STAC. The earlier tutorials like this one. They use this fetch STAC server that we've created, and it really just uses the requests and catches errors. If they're present. The next step is creating a parameter payload function, which is really similar to what I did previously where we created the parameters that for searching. However, this is a very elaborate one in a sense, since it's one of the first tutorials, it's supposed to show you the various ways to set up the parameter dictionary and the things that you can include in your parameter dictionary if you would like. That's how specific you want to go with your search. The next step is reading in a Json file. All this throws pretty much most of them use Json files. Some of them I create the bounding box in the tutorial, but this one I'm pulling a Json file into the output for use and this Json file is in the US 84 reference system. The next step is Latin. We often create maps. Of course we're working with imagery from the earth, and here is a outline of the area plotted using the Json file that is imported. And we use folium for this pattern. So in section three I'm just setting up to interact with the STAC server to get the metadata associated with the scene that I want to pull down. So I'm really just one setting up a baby box from the GeoJSON import the area of interest. And I'm here only really using a single query and remember, a query is the various properties in a scene that are searchable. In this case, I'm using the scene ID property to search for this scene because I know it's just going to bring me back what I want. The single scene that I'm interested in. And here it's just I'm putting everything on a single line to create the parameter dictionary. And in this case, I'm searching for the surface reflectance data. Now in this section we're just entering the parameter dictionary and then retrieving the product that is returning to one item. I know it will be one item because it's only one scene that is in a surface reflectance. With that scene ID and here I'm just extracting the only item that is returned. In this step I am listing the band names. So Landsat band names are present within the assets for a single product. And here I'm pulling the shortwave infrared, the NRI and the red band names with the band names. I'm then parsing the query the single items assets to find the S3 links for download, and then I'm saving it into this band links variable. Here in this section, I'm showing you what assets are available in terms of bands. Our data for this product, and it ranges from the coastal to the eye 20 to the square for 22 band. Next step is very important in every tutorial, especially related to processing. I discuss AWS and what is necessary. You need to have an AWS account. You need to specify direct request or paste when you are accessing Landsat data in the cloud. And there are two ways you can set up your AWS credentials for interacting with or pulling down Landsat data. And here it's a simple function that uses the rest area for pulling down the data and cropping it to that bounding box that was imported as Json. So next step is pulling in a band into a single spectral array. And here I'm using that function to retrieve cog_xr to pull them in a single spectral array, which will be a three banded array with its x and y length here printed out. Next up is plotting to visualize the results, and here is the before of the image as a composite of all three bands. The next step of course, since is about pixel QA, we're pulling in the QA pixel first, pulling it in as a GeoJSON product, then I am pulling in the link and using the XHR function to pull it in as an X-ray product, which is just an array of the data set, and it brings in its coordinates and other spatial metadata associated with it. So it's a single banded array. In this section I'm just listing the various tags that are within the QA pixel. Depending on the satellite, we can assure you that what's available in Landsat eight right now is also available in Landsat nine, and only bits 14 to 15 aren't available for launch at four and seven. And in this section, just printing out the descriptions and the bit locations. For the bits that are available for these QA picks, the band. Next step is masking. We here at this section very elaborate, very long. It shows you step by step how to decode the QA bit and explain bit coding so you can understand what is going on in the functions that are presented. So this entire section here is how to decode the QA bit. What do the symbols in Python mean when you're decoding a bit and how to do that for a single bit. So upon reading this you may actually understand how to decode the bit. And these functions. These two function decoding the bit and the masking. They work together to decode a bit and to mask it per whatever obstruction you want to mask out, whether it's clouds or shadows or water. So the QA mask is uses this decode function to decode whatever bit is located at a pixel location for the Q a mask, and it includes all of the mask of all obstructions that could be in your pixel, and also includes the confidence mask and how you could do it. It's very long. Again, it's just we're not using all of these. There's just so easy to understand how to use it to mask. But in 4.2 here I'm applying the masking. And I'm only masking for clouds though and shadow. And I'm creating a QA mask. Here in this next bullet, broadcasting that mask to the same shape as a QA, the spectral array, which includes the three bands. So now the QA mask is now three banded, and of course, the same x and y range as the spectral array, just for a quick mask in. In this section here I am masking out the spectral band, which includes all the bands that were pulled into three bands and using the QA mask and fill in areas that are snow, clouds and shadows with zero. The next step is plotting just a single plot, for after the masking is complete, you can see that majority, not all of the cloud and shadow and snow pixels have been masked out using the QA band. And in the end of every processing tutorial, there's some evaluation, some discussion, some comparison. So in 4.3 here I'm just again plotting the same results. However, side by side for you to see the before and after of the results of decoding the shape it and masking for Vancouver to remove snow clouds and shadows. So with this one done, I'm going to move on to the last demo, which is the retrieving a single pixel through time. And it's really focus on how to just use one point to get data within whatever date range you would want for Landsat and whatever type of Landsat data you are pulling on what grouping. So for a single pixel, this tutorial is really a time series tutorial. But of course it introduces point feature analysis, time series analysis, and you know, it could be your starter for a change detection analysis. The use case scenario for this tutorial is a single that happened in July 2017. In Florida, it is called the land O'Lakes sinkhole, as well as two houses, and many were condemned surrounding it because that's how large it was. So what we're doing with this single pixel is I've selected a single pixel inside this sinkhole. I now want to see how does this pixel through time, how does it appear through time before and after the sinkhole collapse? And can we see the sinkhole collapse in the normalized different vegetation index to index, which is the vegetation index where high NDVI will indicate the presence of healthier vegetation, where a lower one is unhealthy, or just presence of water or no vegetation. So can we see this collapse in the NDVI values within the sinkhole through time? So this image here is the Google image of the sinkhole January before it collapse. And then the May 2023, five years after its collapse. So it's been through stages. You can even see more in in this one like a bit. So to start, prerequisites tutorials. If you are stuck somewhere you could find the information and prerequisites tutorials. Table of contents. What is available in this tutorial. Next up importing all the modules that are us