class: center, middle, inverse # Overlay Spatial Objects in R ## Soils Map and CDL for South Dakota ### Xiaodan Lyu ### Joint work with Heike Hofmann, Emily Berg ### 26 October 2017 --- count: false # Navigation 1. Background 1. Web Soil Survey 1. Cropland Data Layer 1. Spatial Objects in R 1. Procedure 1. Visualization with Shiny --- class: inverse, center, middle # Background --- ## Conservation Effects Assessment Project (CEAP) .center[ ![](https://www.nrcs.usda.gov/Internet/FSE_MEDIA/nrcseprd1339817.png) ] --- background-image: url(https://www.nrcs.usda.gov/Internet/FSE_MEDIA/nrcs143_011955.jpg) background-size: 100px background-position: 90% 5% ## CEAP-Cropland - To predict the population mean of rainfall-erosion losses from cropland at county level. - One of the objectives is to classify the soils and identify cropland soil. + Public available data resources: Web Soil Survey and Cropland Data Layer (CDL). + Thinking about overlaying soils data with CDL. --- class: inverse, center, middle # Web Soil Survey --- background-image: url(https://www.nrcs.usda.gov/Internet/FSE_MEDIA/nrcseprd1334237.jpg) background-size: 250px background-position: 85% 5% ## SSURGO database - Detailed Soil Map. - The SSURGO database can be viewed in the [Web Soil Survey](https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm) or downloaded in ESRI shapefile format. - The extent of a SSURGO dataset is a soil survey area, which may consists of a single county, multiple counties or parts of multiple counties. - There are 67 soil survey areas in South Dakota. + 57 survey areas (SD003-SD137) in South Dakota cover a single county; + 10 survey areas (SD600-SD603, SD606-SD613) do not. --- --- ## 10 Soil Survey Areas ```r df %>% group_by(AREASYM) %>% summarise(n = length(unique(COUNTY)), COUNTY = toString(unique(COUNTYNM))) %>% arrange(desc(AREASYM)) %>% head(10) ``` ``` ## # A tibble: 10 × 3 ## AREASYM n COUNTY ## <fctr> <int> <chr> ## 1 SD613 1 Shannon ## 2 SD612 3 Jackson, Pennington, Shannon ## 3 SD611 1 Jackson ## 4 SD610 1 Jackson ## 5 SD607 2 Custer, Pennington ## 6 SD606 2 Custer, Pennington ## 7 SD603 2 Brule, Buffalo ## 8 SD602 2 Hanson, Hutchinson ## 9 SD601 1 Meade ## 10 SD600 1 Meade ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/Area_CTY.jpg?raw=true) background-size: 800px background-position: 50% 50% ## 10 survey areas Map --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/WSS_Slope_Map.jpg?raw=true) background-size: 650px background-position: 50% 50% ## SD612, Badlands National Park --- ## Map Units - The soil map outlines areas called map units. - A soil survey area is a group of map units. - Labels for a mapunit include AREASYM, MUKEY, MUSYM and MUNAME. ```r mu %>% dplyr::select(AREASYM, MUKEY, MUSYM, MUNAME) %>% glimpse ``` ``` ## Observations: 7,501 ## Variables: 4 ## $ AREASYM <fctr> SD011, SD011, SD011, SD011, SD011, SD011, SD011, SD01... ## $ MUKEY <int> 418821, 418822, 418823, 418824, 2765292, 2765288, 2765... ## $ MUSYM <fctr> Wa, Wb, WeA, Wo, Z150A, Z151A, Z152A, Z153A, Z155A, Z... ## $ MUNAME <fctr> Wakonda-Chancellor complex, 0 to 2 percent slopes, Wa... ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/MU_Hanson_417991.jpg?raw=true) background-size: 600px background-position: 50% 80% ## Mapunit Example Plotted Using R MUKEY = 417991; MUSYM = "Pr"; MUNAME = "Prosper-Stickney complex, 0 to 2 percent slopes". --- class: inverse, center, middle # Cropland Data Layer --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/1/14/US-NationalAgriculturalStatisticsService-Logo-Tagline.svg/220px-US-NationalAgriculturalStatisticsService-Logo-Tagline.svg.png) background-size: 100px background-position: 90% 5% ## CDL data - Resource: USDA-NASS, available free at the [CropScape portal](https://nassgeodata.gmu.edu/CropScape). - a raster, geo-referenced, crop-specific land cover data layer; - available in GeoTIFF(.TIF) file format + .tif file: each pixel has an integer value ranging from 0 to 254. + .tif.vat.dbf file: information about each value, including CLASS_NAME and COLOR (<span style="color:red">RED</span>, <span style="color:green">GREEN</span>, <span style="color:blue">BLUE</span>, OPACITY). ```r cdl.sd.dbf %>% glimpse ``` ``` ## Observations: 255 ## Variables: 6 ## $ VALUE <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1... ## $ CLASS_NAME <fctr> Background, Corn, Cotton, Rice, Sorghum, Soybeans,... ## $ RED <dbl> 0.0000000, 1.0000000, 1.0000000, 0.0000000, 1.00000... ## $ GREEN <dbl> 0.0000000, 0.8274510, 0.1490196, 0.6588235, 0.61960... ## $ BLUE <dbl> 0.00000000, 0.00000000, 0.14901961, 0.89803922, 0.0... ## $ OPACITY <dbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/39fe2f1a5c47de74ddf46624fe6634b69f5b58b8/2_Zeros/Scripts/Presentation/Overlay_Slides/images/US_2016_CDL_legend.jpg?raw=true) background-size: 500px background-position: 50% 50% --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/39fe2f1a5c47de74ddf46624fe6634b69f5b58b8/2_Zeros/Scripts/Presentation/Overlay_Slides/images/cdllegend_56m_r_sd_2006.jpg?raw=true) background-size: 400px background-position: 50% 50% ## 2006 South Dakota CDL color legend --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/2006_Hanson_417991_CDL.jpg?raw=true) background-size: 600px background-position: 50% 80% ## CDL Example Plotted Using R --- class: inverse, center, middle # Spatial Objects in R --- ## Coordinate Reference Systems (CRS) CRS for spatial objects are similar to units for volume or distance. ```r ## a data frame of all the CRSs rgdal::make_EPSG() %>% glimpse ``` ``` ## Observations: 5,078 ## Variables: 3 ## $ code <S3: AsIs> 3819, 3821, 3824, 3889, 3906, 4001, 4002, 4003, 4004... ## $ note <S3: AsIs> # HD1909, # TWD67, # TWD97, # IGRS, # MGI 1901, # Un... ## $ prj4 <S3: AsIs> +proj=longlat +ellps=bessel +towgs84=595.48,121.69,5... ``` ```r ## CRS used by Google Earth sp::CRS("+init=epsg:4326") ``` ``` ## CRS arguments: ## +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 ## +towgs84=0,0,0 ``` --- ## Coordinate Reference Systems (Cont'd) When `readOGR` is used to import ESRI shapefiles, the CRS is automatically linked to the R spatial object. ```r poly.name <- "SD6XX" poly <- rgdal::readOGR(dsn = path.to.poly.folder, layer = poly.name) ``` ``` ## OGR data source with driver: ESRI Shapefile ## Source: "G:/Grads/lyux/mu_by_areasym/SurveyAreaPoly", layer: "SD6XX" ## with 10 features ## It has 2 fields ``` ```r ## To retrieve the CRS for a spatial object sp::proj4string(poly) ``` ``` ## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" ``` ```r ## not run ## To assign a known CRS to a spatial data proj4string(x) <- CRS("+init=epsg:4326") ## To transform from one CRS to another newData <- spTransform(x, CRS("+init=epsg:4269")) ``` --- ## Raster Objects ```r cdl.sd <- raster::raster(x = path.to.cdl) cdl.sd ``` ``` ## class : RasterLayer ## dimensions : 7368, 11016, 81165888 (nrow, ncol, ncell) ## resolution : 56, 56 (x, y) ## extent : -652194, -35298, 2164650, 2577258 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## data source : C:\Users\lyux\Box Sync\SAEZeros\Data\CDL\CDL_2006_46.tif ## names : CDL_2006_46 ## values : 0, 255 (min, max) ``` ```r ## retrieve spatial bounding box sp::bbox(cdl.sd) ``` ``` ## min max ## s1 -652194 -35298 ## s2 2164650 2577258 ``` --- ## Raster Objects (Cont'd) ```r ## To crop by row and column number cdl.sub <- raster::crop(x = cdl.sd, y = extent(cdl.sd, 1001, 1100, 1001, 1100)) dim(cdl.sub) ``` ``` ## [1] 100 100 1 ``` ```r ## Extract Cell Values getValues(cdl.sub) %>% table ## extract(cdl.sub) ``` ``` ## . ## 29 61 111 121 122 152 176 190 195 ## 40 217 8 262 1 8011 1450 1 10 ``` ```r ## Transfrom Raster to SpatialPointDataFrame cdl.point <- raster::rasterToPoints(cdl.sub, spatial = TRUE) class(cdl.point) ``` ``` ## [1] "SpatialPointsDataFrame" ## attr(,"package") ## [1] "sp" ``` --- ## Spatial Polygons Data Frame `rgdal::write.OGR(obj, dsn, layer, driver = "ESRI Shapefile")` ```r mu.sd@data %>% glimpse ## slot(mu.sd, "data") ``` ``` ## Observations: 9,394 ## Variables: 12 ## $ OBJECTID <int> 856089, 856090, 856173, 856174, 856175, 856176, 856... ## $ AREASYMBOL <fctr> SD602, SD602, SD602, SD602, SD602, SD602, SD602, S... ## $ SPATIALVER <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ... ## $ MUSYM <fctr> CdA, CdA, Bo, Bo, Bo, Bo, Cb, Bo, Cb, Bo, Bo, Bo, ... ## $ MUKEY <fctr> 417943, 417943, 417939, 417939, 417939, 417939, 41... ## $ Shape_Leng <dbl> 748.7136, 567.2321, 252.3721, 2637.9464, 2387.1247,... ## $ Shape_Area <dbl> 38846.432, 16061.870, 4102.656, 119761.515, 87533.6... ## $ NAME <fctr> Hanson, Hanson, Hanson, Hanson, Hanson, Hanson, Ha... ## $ STATE_NAME <fctr> South Dakota, South Dakota, South Dakota, South Da... ## $ STATE_FIPS <fctr> 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46, 46... ## $ CNTY_FIPS <fctr> 061, 061, 061, 061, 061, 061, 061, 061, 061, 061, ... ## $ FIPS <fctr> 46061, 46061, 46061, 46061, 46061, 46061, 46061, 4... ``` ```r mu <- mu.sd %>% subset(MUKEY == "417938") ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/Groundhog.jpg?raw=true) --- ## Spatial Polygons Data Frame (Cont'd) ```r class(mu@polygons) ``` ``` ## [1] "list" ``` ```r mu@polygons[[1]] %>% glimpse ``` ``` ## Formal class 'Polygons' [package "sp"] with 5 slots ## ..@ Polygons :List of 1 ## .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots ## ..@ plotOrder: int 1 ## ..@ labpt : num [1:2] -97.9 43.5 ## ..@ ID : chr "208" ## ..@ area : num 2.64e-06 ``` ```r geometry(mu) ``` ``` ## class : SpatialPolygons ## features : 77 ## extent : -97.9669, -97.61069, 43.49695, 43.84431 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ``` --- ## Overlaying ```r x %over% y ## over(x, y) ``` at the spatial locations of object x retrieves the indexes or attributes from spatial object y. ```r ## require identitical (CRS) identicalCRS(cdl.point, mu) ``` ``` ## [1] FALSE ``` ```r cdl.point <- spTransform(cdl.point, CRS = proj4string(mu)) sp::over(mu, cdl.point) %>% glimpse ``` ``` ## Observations: 77 ## Variables: 1 ## $ CDL_2006_46 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA... ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/Groundhog.jpg?raw=true) --- ## Overlaying (Cont'd) ```r index <- !is.na(over(mu, cdl.point)) class(index) ``` ``` ## [1] "matrix" ``` ```r cdl.point[index[1,], ] ``` ``` ## class : SpatialPointsDataFrame ## features : 0 ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## variables : 1 ## names : CDL_2006_46 ``` ```r cdl.point[mu, ] ## cdl.point[geometry(mu), ] ``` ``` ## class : SpatialPointsDataFrame ## features : 0 ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## variables : 1 ## names : CDL_2006_46 ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/4256e6f774ae27650b8dea692997048ab1042267/2_Zeros/Scripts/Presentation/Overlay_Slides/images/raster_plot.jpg?raw=true) background-size: 600px background-position: 50% 90% ## Plotting ```r data.frame(cdl.point) %>% ggplot() + geom_point(aes(x = x, y = y, color = factor(CDL_2006_46)), shape = 15) + coord_equal() ``` --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/4256e6f774ae27650b8dea692997048ab1042267/2_Zeros/Scripts/Presentation/Overlay_Slides/images/mu_plot.jpg?raw=true) background-size: 550px background-position: 50% 90% ## Plotting (Cont'd) ```r fortify(mu) %>% ggplot() + geom_path(aes(x = long, y = lat, group = factor(group))) + coord_equal() ``` --- class: inverse, center, middle # Procedure --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/Groundhog.jpg?raw=true) --- ## Implementation 1. Export SSURGO soils map in South Dakota into a shapefile. 1. Download CDL raster file from year 2006 to year 2016 for South Dakota from USDA-NASS. 1. Iterate first by county then by mapunits + Crop the CDL raster to the extent of the bounding box of the target mapunit; + Transforms rater file into spatial point object; + Overlay the mapunit polygon with the cropped CDL points; + Table pixel count by CDL category grouped by each mapunit. --- ## Some tricks 1. Split the shapefile into separate shapefiles grouped by soil survey areas. 1. Omit background pixels and pixels with missing values in CDL data. 1. download the latest county boundary shapefiles from the website of US census Bureau, then overlay them with the shapefiles of the 10 survey areas. 1. Transform spatial objects to be with the same Coordinating Reference System. 1. Using parallelized computing to speed up the process. <sup>[1]</sup> .footnote[ [1] Package `multidplyr` is used. ] --- #### Example overlaid image ![overlay](https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/2006_SD602_417991.jpg?raw=true) .center[<small>Figure 1: SD602 MUKEY 417991 overlaid with 2006 CDL raster.</small>] --- background-image: url(https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/2006_Hanson_417991.jpg?raw=true) background-size: 600px background-position: 50% 50% #### Example overlaid image Hanson MUKEY 417991 overlaid with 2006 CDL raster. --- #### Example CDL pixel counts ```r hanson.417991 %>% arrange(desc(PIXEL_COUNT)) %>% filter(PIXEL_COUNT>=50) ``` ``` ## CODE CLASS_NAME PIXEL_COUNT ## 1 5 Soybeans 12874 ## 2 1 Corn 12214 ## 3 176 Grass/Pasture 5836 ## 4 121 Developed/Open Space 1749 ## 5 24 Winter Wheat 1472 ## 6 61 Fallow/Idle Cropland 813 ## 7 36 Alfalfa 699 ## 8 195 Herbaceous Wetlands 550 ## 9 23 Spring Wheat 543 ## 10 141 Deciduous Forest 196 ## 11 28 Oats 50 ``` --- class: inverse, center, middle # Visualization with Shiny --- ## Beta Version .center[ ![](https://github.com/XiaodanLyu/Lognormal-Extension/blob/26e3f90e937d99de249373d86a297a1d5dc95843/2_Zeros/Scripts/Presentation/Overlay_Slides/images/AnnieLyu.jpg?raw=true) available at [shinyapps.io](https://lyux.shinyapps.io/Shiny_Overlay/) ] --- ## Other Visualization Tools Existing tools besides CropScape and Web Soil Survey: - Interfaces to SoilWeb - California Soil Resource Lab :: [SoilWeb Apps](https://casoilresource.lawr.ucdavis.edu/soilweb-apps/) - R packages provided by project [**Algorithms for Quantitative Pedology**](https://r-forge.r-project.org/R/?group_id=590) + `aqp`, `sharpshootR` and `soilDB`: a collection of algorithms related to the modeling of soil resources, soil classification, soil profile aggregation, and visualization.