As DIG CAMP’s focus included working with data in ways that were relevant to our program participants, the project had to build capacity of undergraduate mentors to support high school students in their work with data and provide access to real-world data and data tools. Below we describe the process we used.
Data Science Specialist
The DIG CAMP team employed an Earth Science graduate student who was well-versed in data science, GIS, and data visualization as a Data Science Specialist to provide technical support for anything data-related in the DIG CAMP curriculum. Some of the Data Science Specialist’s duties included
1) curating and compiling datasets into a DIG CAMP database students could use for their research projects.
2) teaching a data skills workshop to the mentors.
3) introducing the datasets available for final projects to the students, and teaching lessons on how to analyze and visualize different types of data in CODAP and ArcMaps online.
4) assisting students and mentors with any data analysis and visualization needs throughout DIG CAMP.
DIG CAMP Data Resources
During DIG CAMP, our students’ final project was a group research project investigating a geoscience and climate change-related topic of their choosing. With the guidance of the mentors, each group of 3-4 students designed a data-driven research project from start to finish – from choosing a relevant geoscience or climate change-related topic, formulating a research question, selecting appropriate data, performing data analysis, creating data visualizations, and crafting their findings into a narrative for a final presentation. The final research project allowed the students to take the data skills they learned during DIG CAMP and independently apply them.
Each group of students utilized data from a DIG CAMP-specific ‘database’ that the Data Science Specialist prepared for their final project. The DIG CAMP database included pre-processed, ready-to-use datasets from various geoscience topics and economic, social, and demographic data. For their final project, we wanted students to use processed datasets that were appropriate for the scale of their final project and in the correct format for the software we used during the program. However, rather than assigning each research group a dataset, we wanted to allow the students to learn how to select data from a database and decide which sets are most appropriate for their research questions. To facilitate the data selection process, we built the DIG CAMP database with over 250 environmental, health, demographic, and economic datasets. We allowed the students to mix and match datasets to answer their research questions best. The datasets selected for the DIG CAMP database were chosen to cover a range of geoscience topics, and our choices were largely informed by the DIG CAMP curriculum and field trips to community partners. Datasets used in the DIG CAMP database were tabular and geospatial and sourced from open-access government data portals (federal and state) and, in some cases, directly from community partners.
Building a DIG CAMP Database
Below, we outline general steps for selecting data resources for a DIG CAMP database.
Step 1: Identify Data Topics
Select data categories relevant to your curriculum, local area, current events, and community partners. For example, we selected wildfires and air quality as data topics because of the recent 2020 CZU wildfire complex in Santa Cruz, and marine life and health as topics because of field trips to Ano Nuevo Elephant Seal reserve and an aquaculture lab at Monterey Bay Aquarium Research Institute. We also identified relevant non-geoscience data topics we anticipated could be used with the environmental data, such as asthma rates by census tract and census demographic data.
Step 2: Dataset Research and Collection
Based on the data topics and themes you identify, search for relevant data sources online. Ensure that any datasets you select are open-access and quality-controlled. Federal government agency data portals, such as NOAA NCEI Climate Data Online Portal and USGS Earth Explorer, were particularly useful for sourcing data. State and local government data portals such as the Santa Cruz County GIS portal and the California Water Board’s Groundwater Ambient Monitoring and Assessment portal were also great places to find relevant data. If you use ArcGIS online in your program, the ESRI Living Atlas has a wealth of excellent geospatial data assets. If possible, get datasets directly from your community partners or field trip destinations. Be sure to get a mix of spatial, tabular, and time series datasets.
Step 3: Dataset Cleaning and Processing
Once you have found and downloaded your selected relevant datasets, it can be necessary to do some processing and cleaning of the datasets to 1) make sure the data is appropriate for high-school student level 2) ensure the data is in the correct formats to be used with the software the students plan to use for data analysis (in our case these were Google Sheets, Excel, Google Earth, CODAP, and ArcGIS Online). We aimed to keep our datasets spatiotemporally limited to a scale appropriate for a research project conducted in less than a week. Some examples of processing the Data Science Specialist performed included aggregating high-resolution environmental measurements to daily, monthly, and annual averages, removing null data entries, and cropping geospatial files to relevant areas.
Step 4: Database Compilation, Documentation, and Presentation
After processing the data, you can compile the ready-to-use datasets in a DIG CAMP database. We stored all the datasets in a Google Drive organized by data topic. The students had access to Google Drive and could explore the datasets when planning their research projects. We also created a companion DIG CAMP database documentation file that included metadata, sources, and descriptions of each dataset. An example database documentation file can be found here: DIG Datasets Documentation. When introducing the final project to the students, the Data Science Specialist presented the database documentation file to students and walked the group through the available dataset options, how to interpret the metadata, and how to download the datasets from Google Drive and open them in the appropriate software.
Data Analysis and Visualization Software
Students and mentors conducted data analysis and visualization during DIG CAMP using Google Sheets, CODAP, and ArcMaps Online. We used Google Sheets to explore and manipulate tabular data. CODAP was an excellent tool for visualizing tabular and spatial data in one place and was very user-friendly. However, the GIS capabilities of CODAP are somewhat limited. When students wanted to make more complex maps or use satellite imagery in their projects, we used ArcMaps Online. Students compiled their data visualizations made in CODAP, Google Sheets, or ArcMaps Online into presentations with Google Slides. During a workshop before DIG CAMP, mentors were trained in how to use this software, as well as trained on data analysis and GIS skills.
Helpful Resources
Teaching with Data (SERC)
ArcGIS
CODAP
Video tutorials: A 10-part video series covering dataset access, processing, and analysis for the DIG Curriculum. Also includes tutorials on how to use CODAP and ArcMaps Online.
- Tutorial 1: Downloading Data
- Tutorial 2: Downloading Spacial Data
- Tutorial 3: Exploratory Data Analysis and Descriptive Stats
- Tutorial 4: Calculations with Temporal Data
- Tutorial 5: Resampling Time Series Data in Google Sheets
- Tutorial 6: Graphing in CODAP
- Tutorial 7: Mapping in CODAP
- Tutorial 8: Uploading and Mapping Data in ArcMaps Online
- Tutorial 9: Using ESRI’s Living Atlas in ArcMaps Online
- Tutorial 10: Publishing and Exporting Your Map in ArcMaps Online