Monday, April 17, 2017

Post 5: Network Analysis

Background
According to a White Paper study of the transportation impacts of frac sand mining, the use of frac sand has exploded with the development and widespread use of the frac sand mining technique for oil and natural gas. With this demand, a readily supply of sand is necessary to maintain the process. This demand has centered on the west central frac sand mines of Wisconsin. This increased demand has left many communities concerned on the cost forces on their communities which contain the frac sand mines by the process. Over the course of a mines life, trucks may make hundreds of tips to and from the mine for transport and delivery. This exercise was a continuation of the ongoing frac sand project completed in this course. In it an estimation of the costs counties and communities will be forced to pay as a result of frac sand mining. What's critical is that this is not a scientific or credible case study. The true cost of transport for trucks is not known, nor is the number of truck trips to and from each mine known. Thus, an estimate will be made based on several arbitrary values.
Data
  • County, railroad, railroad terminals, and mine location datasets provided by the Wisconsin DNR.
  • Network Dataset provided by ESRI street map USA.
Methods
First, a python script was generated to select only the active mines within the state Wisconsin that do not have a rail loading station on site or are within 1.5 kilometers of a railway (Post 2: Script 2). These mines will be forced to utilize public roadways to transport mines to available rail terminal with loading stations. In addition, this python script created a feature class from the rail terminals feature class of only the terminals with viable loading stations for trucks. Then, the address field was removed from the mines feature class to prevent an error from happening during later analysis. From here a model was generated in order to completely format  and perform the necessary network analysis, field creation & computation,  and any other necessary procession required in order to properly analyze the network data (Figure 1). The steps in the model were as follows:

  • A Closest Facility Layer was added, with the input being set to the streets. Travel was also set to facility
  • An Add Locations tool was next, with the input locations being the mine selection feature class. These were set to incidents to make them the necessary start of any route
  • Add Locations was added again, with the rail terminal selection being the input set as the facilities
  • A Select data tool was added to select the newly generated route
  • A Copy features toll was utilized to create a feature class of the route data
  • A Project tool was utilized to convert the route into a projected coordinate system with a linear unit in feet. This was necessary for a later Summarize tool to properly summarize the distance trucks traveled on roads later.
  • The Counties feature class was likewise projected into the same coordinate system.
  • An Intersect tool was utilized between the county and routes feature classes
  • A Summarize Statistics tool was used to calculate the quantitative total distance of the routes in each county.
  • The Add Field tool was utilized to create a field for estimated cost in the counties feature class
  • Calculate field was utilized to estimate the cost each county would be required to pay annually for frac trucks. This based on several estimates. It was estimated that trucks would take 50 to the terminal from the mine and 50 trips back to the mine annually, with each mile a truck travels costing a county $.022. It was also multiplied by 5280 to generate a cost per mile, rather than a cost per foot as a result of the summarize tool generating the distance in feet. The equation used was as follows:
    • cost = (summarized distanced traveled in each each county) * .022 * 100 * 5280.
  • This resulting data table was joined to the County Boundaries feature class, so the cost per county could be properly displayed.
  • The resulting data and feature classes were utilized to create a map and data table which displayed the estimated cost on each county.

Figure 1: The model utilized to generate the routing from each frac sand mine to the closest available loading terminal and calculate the cost of transport each county would be forced to pay.






















Results
By looking at the data (Figure 2), it can be seen that most counties accrue little to no cost as a result
Figure 2: A data table 
displaying the total length
of routes and cost accrued
by truck transport in each 
country.
of frac mining. The minimum cost is $0 and the average is only roughly $30. However, the maximum is $636 and the standard deviation is roughly $100. These highest costs are accrued by counties located in the northern portion of the collection of mines present in the western portion of the state (Figure 3). This means they have the greatest cost as a result of overlapping mines and routes within the county. In contrast, counties located outside of this clustering of mines accrue little to no cost as a result of only one or no mines present in the county.
Figure 3: A map displaying the routing for each mine to the closest viable rail terminal and the cost of transport each county will be annually forced to pay. This data is estimated arbitrarily and should not be utilized for an actual case study.
Conclusion
While this process has allowed for the creation of an estimate of costs, it is partially incomplete. As a result of the arbitrarily estimated cost per mile and number of trips each trucks take annually to and from each mine, the true cost is still unknown. However, this project allowed for the creation of a model generating the routes and estimated costs. In the future, this or a similar process could actually be utilized in for a project similar to this one.
Sources
Hart, M., Adams, T., Schwartz, A. (2013). Transportation Impacts of Frac Sand Mining in the MAFC Region: Chippewa County Case Study. In White Paper Series: 2013. Retrieved 4/20/2017, from http://midamericafreight.org/wp-content/uploads/FracSandWhitePaperDRAFT.pdf

Hupy, C. (2017). Exercise 7: Network Analysis Part 1-Data Preparation. Eau Claire, WI.

Hupy, C. (2017). Exercise 7: Network Analysis. Eau Claire, WI.

Network Dataset provided by ESRI street map USA.

County, railroad, railroad terminals, and mine location shapefiles provided by the Wisconsin DNR.



Friday, April 7, 2017

Post 4: Data Normalization, Geocoding, and Error Assessmen


Goals and Objectives
As a continuation of the Sand-mines project begun at the start of the semester, this exercise would serve as a first step to constructing a suitability and risk model for frac sand mining in the Western portions of Wisconsin. As a part of this, data on sand mines needed to be normalized, the mine addresses from the data required geocoding, and the results needed to be compared to known values for these mines, in order to measure error. It was every student's responsibility to complete this for 19 of the mines. Geocoding is the process of matching locations in feature classes to known geographic locations using known addresses. By completing this process, an accurate map of mine locations could be constructed for later analysis.
Methods
The original mines data was first opened within excel. From this file, the 19 mines which were required to be normalized and geocoded personally were removed from the and placed within their own excel table. For each mine, a field was added in the data table for each portion of the complete address entry: PLSS, Street Address, Street (name), Street Type, City, State, and Zip Code. These fields were then populated using the corresponding data taken from the original address field (Figure 1).

Figure 1: A table showing both the unnormalized address entries (Address) and normalized address entries (PLSS, Street Address, Street, Street Type, City, State, Zip Code) for each of the nineteen mines, as organized by the mines' unique IDs. 
This is what is known as data normalizing. This is completed for two primary reasons. In order for ArcGIS to properly analyse address data as proper fields, it must first be broken down into these components, That is because the program cannot compartmentalize the whole automatically. Additionally, not every address field entry is organized in the same way. Some describe their addresses in different orders, while other are missing portions of their address data (ex: PLSS, Zip code, street name, etc). This is due to the initial recording of the data taken by the Wisconsin DNR. It is common for data to not be normalized when it is first received or retrieved from an organization.
Figure 2: The completion message for the geocoding process.
From the message, it can be detemined that fourteen mines
were matched to a known address, one was matched to two
equally likely addresses, and four could not be matched to any
known address in the database.
The data was then added into an ArcMap, along with an Imagery Basemap. After logging into the University Enterprise Account, the geocoding toolbar was activated, and the "Geocode Addresses" tab was selected. The World Geocode Service was chosen as the address locator, the Address input fields were matched to the data table, and OK was selected in the window to start the geocoding. When the geocoding was completed, a message appeared displaying the matched, tied and unmatched addresses from the list of mines (Figure 2). From this, it was determined that fourteen of the mine addresses matched a known location, one was matched to two equally likely candidates, and four could not be matched to any addresses in the database. These would likely need to be manually matched later.
The interactive rematch inspector window was then opened. With this, each of the matches for each mine was inspected to see how close it was to its actual location. As it turned out, all but one of the addresses actually failed to match the location of the mine. Instead, these were geocoded to the center of the town listed in the address of each location. To compensate for this, these addresses were manually matched up with their what was believed to be the actual corresponding mine location, in the interactive rematch window. This was accomplished by using a combination of the known address in a Google maps window, using the ArcMap imagery, and if that failed, finding the location using the PLSS address in conjunction with Wisconsin PLSS Sections and PLSS townships shapefiles. The PLSS address would determine in which subdivision of land (both township and and section). This was especially critical with the addresses that came up as unmatched in the geocoding process, as these only had a listed PLSS address. These steps and tools were used until every address was matched with what was believed to be its corresponding mine. Afterwards, the data was exported as a point shapefile so it could be analysed.

The completed geocoded mines location shapefile was added into a new data frame. The true_mine locations_shapefile was also added to this data frame. Using the Select tool and the a query, only the mines nineteen mines which were a personal responsibility were selected out of the shapefile, using the unique mine ID field. This would allow for the comparison of the personally geocoded locations to what was considered the actual locations. In addition, a merge was completed on all the other students' personal shapefiles for their personally assigned, geocoded mines. These were made available by each student when they completed their geocoding. The list of mines geocoded by each student would have some overlap with others in the class. Thus, they could be compared against one another. Unfortunately, several of the students failed to properly name their mine unique ID field (Mine_Uniqu). To prevent this, a field map was used during the merge in order to correct these errors in naming. In addition, two fields, each in an attribute table of one the the shapefiles, needed to be altered, as they were incorrectly populated with values that prevented the merge (ex: words used to represent a null value for a long integer). Once the merge was completed, the same query and Select tool originally used to find geocoded mines in the true mine locations shapefile matching the mine unique IDs (Mine_Uniqu) of the personally completed nineteens mines was used to find the geocoded results of students who also had completed these nineteen mines.  This created a point shapefile out of only these corresponding nineteen mines from the other students' geocoding results.
With shapefiles of the personal geocoded locations, the true mine locations, and the class geocoded mines for the assigned nineteen mines finally ready, they could require formatting before analysis. Each one was reprojected into the into the NAD 1983 State Plane Wisconsin Central FIPS 4802 projected coordinate system. This was required before analysis, as they were originally projected into a geographic coordinate system that used degrees as its unit of measurement for distance and location. By reprojecting them into a projected coordinate system and changing the data frame to this as well, distances between mine locations could be measured instead in linear meters.

The near tool was used to measure the distance of each personally geocoded mine to the closest "actual" mine location. This was usually the corresponding actual location whose Mine Unique ID field matched each of the geocoded mines. However, this was not the case with one mine and its corresponding actual location. In this case, the Measure tool was used to measure the distance  between the geocoded location and the actual location. This data was then added into an excel table, and several statistical measures (minimum, maximum, mean, median, standard deviation). This would serve as the distance error data between the geocoded locations and the actual locations. In addition, a similar use of the Near tool was used to gather data on the distance between the geocoded mine locations and the corresponding locations geocoded by peers. The closest corresponding mine location of other students was used and recorded, instead of all corresponding distances, as this would provide a sample of the whole that would likely be indicative of the error between the personal locations and those of others. This would also serve to point out any locations from the true locations shapefile that may actually be incorrect. in addition, many students had not completed the geocoding process in the allotted time. As a result, the sample data that could be gathered from others was limited.One corresponding location for each of the nineteen geocoded mines taken from the other students geocoded locations equaled out to half of the points made available by other students.  Several mines had only one corresponding geocoded location completed by other students. In the case of Mine 328, no other student had geocoded this mine's location. Once the distance between each geocoded mine and its closest corresponding geocoded location was collected, it was similarly recorded in a data table as distance error values, with the same statistical measurements being collected. Then, the geocoded mine locations shapefile, the true mine locations shapefile, and the class geocoded locations shapefile of the corresponding nineteen mine locations was used to construct a map to more accurately and efficiently convey distnce between the points.
Results
As seen by the results, the greatest error between the actual location and the geocoded location is
Figure 3: A data table showing the distance error between each geocoded
mine location and both the nearest geocoded location determined by a
classmate and the location determined by the true or given dataset.
27465 meters, the minimum distance error is 148 meters, the average/mean error is 3846 m, median error is 650 m, and the standard deviation is 8049 meters (Figure 3). At first glance this appears to be a huge amount of error. But by comparing the median to the mean, it can be determined that most of the individual error values fall far less of the average error, as the median is less than the mode. Indeed, when looking at both the distance error values and their corresponding locations on a map (Figure 4), most of the error is relatively minor. The large error values can generally be attributed to mistakes made in specific mine identification, while the small error values exist because the geocoded location was placed on the mine's roadside entrance, while the actual locations mark the center of the mine. The physical representation of this minor error is shown by Mine 295 and its corresponding geocoded locations (Figure 4). In addition, the minimum error between the geocoded mines and the closest mine location provided by a peer was 1 meter, the maximum distance error was 56972 meters, the median distance error was 63 meters, the mean distance error was 5740 meters, and the standard deviation was 14662 meters. Once again, the few high error values tend to be over-represented in the mean error value, while both the median error and the visual display of geocoded locations show a relatively minor error in most of the locations. The few with high errors are likely due to the fact that the sample was limited to fairly few points, and either the geocoded location or the other student's geocoded location for these mines was incorrect and/or the only sample point available.
Figure 4: A map depicted the  personally geocoded mine locations, the actual mine locations,
and corresponding geocoded mine locations provided by other students (left), an additional map
depicting the geocoded mine locations of Mine 295 over a imagery basemap (center-right), and
a reference map depicting all the geocoded mine locations in relation to the whole of
Wisconsin (bottom right).
Discussion
Of the distance errors collected, those of great value, resulting in geocoded locations being placed at entirely separate mines, are a form of gross error. They are a result of a mistake or blunder made while either selecting the mine location in the geocoding process, or in one case, recording the address of the mine. In the case of mine 274, the geocoded location was likely placed at the wrong location, as its distance error is roughly 87,000 meters from the actual location, and has the highest distance error away from its corresponding location placed by another student. However, in Mine 247's case, the error likely lies in the fault of whoever originally created the data-table. This is because its distance error from the report actual location is 24,000 meters, while its distance error is from the nearest other student's point is only 9 meters. After reviewing its PLSS address, it is clear that the location where Mine 247 is placed by the "actual" data is not the address marked in the data table. Indeed, it would seem that the location of the mine placed during the geolocating process, both personally and by other students, appears in the correct PLSS address. Because of this, it can be assumed that addresses may have gotten mixed up when this data was first tabulated. Both of these errors can be classified as gross, operational errors, appearing as a result of a large mistake made when either creating the original data-table or when analyzing the data table. Additional, these errors can be referred to as operational, attribute data input error. In other words, errors in the input of attribute data, either in the Address field for Mine 247 when the data was first tabulated, or in Mine 274's, and those like it, error in the input of x and y location during the geocoding process.

In the case of the small amounts of error, like in Mine 295, this is not a result of mistakes and blunders made by the operator or data analyzer. Instead, this error is a result of a combination of systemic and random error. For these locations, all points were correctly placed on the proper mine. However, their specific location at each mine are different. This is a result of the computer generalizing the location to the exact center of the mine, while each student chose what they believed to be the entrance of the mine as its location, under instruction. This personal bias is what's known as systemic error. Random error, the most minor of all, results from the fact that a human being can not be precise enough to manually place the point at the exact entrance to each mine during the geocoding process. This is because a person can only be so precise with the manual placement of points in  geocoding, and the imagery used has a maximum resolution. The bias on placing points and is a type of geographic error resulting from data attribute error, a sources of data automation and compilation error. However, rather than being a sources of operational error (a mistake), it is instead inherent error, which is minor error expected to occur and unavoidable during the process. Finally, the ability to only get so accurate because of human and resolution limitations is inherent image analysis error, or the error which occurs based on the quality of the image and the precision in its analysis.
But what does this all mean for the geocoded mine location data. Points with relatively minor error resulting from inherent random or systemic error are not wrong. This data can be considered correct. However, it is important to remember the source of the error to minimize its concentration later on or to possibly eliminate bias in future data collection. The data that needs to be thrown out or removed is comprised of the points having large amounts of gross operational error. These points are usually drastically off from from their real world numbers. It is critical to avoid using these points in further analysis, as they may lead to a false conclusion that is in actuality far off from what should be supported.
Sources
Hupy, C. (2017) Exercise 6: Data Normalization, Geocoding, and Error Assessment: Sand Mining Suitability Project. Eau Claire, WI.

Mine location data, PLSS townships shapefile, and PLSS Sections shapefile provided by the Wisconsin Department of Natural Resources (2017)