As a continuation of the Sand-mines project begun at the start of the semester, this exercise would serve as a first step to constructing a suitability and risk model for frac sand mining in the Western portions of Wisconsin. As a part of this, data on sand mines needed to be normalized, the mine addresses from the data required geocoding, and the results needed to be compared to known values for these mines, in order to measure error. It was every student's responsibility to complete this for 19 of the mines. Geocoding is the process of matching locations in feature classes to known geographic locations using known addresses. By completing this process, an accurate map of mine locations could be constructed for later analysis.
Methods
The original mines data was first opened within excel. From this file, the 19 mines which were required to be normalized and geocoded personally were removed from the and placed within their own excel table. For each mine, a field was added in the data table for each portion of the complete address entry: PLSS, Street Address, Street (name), Street Type, City, State, and Zip Code. These fields were then populated using the corresponding data taken from the original address field (Figure 1).
The interactive rematch inspector window was then opened. With this, each of the matches for each mine was inspected to see how close it was to its actual location. As it turned out, all but one of the addresses actually failed to match the location of the mine. Instead, these were geocoded to the center of the town listed in the address of each location. To compensate for this, these addresses were manually matched up with their what was believed to be the actual corresponding mine location, in the interactive rematch window. This was accomplished by using a combination of the known address in a Google maps window, using the ArcMap imagery, and if that failed, finding the location using the PLSS address in conjunction with Wisconsin PLSS Sections and PLSS townships shapefiles. The PLSS address would determine in which subdivision of land (both township and and section). This was especially critical with the addresses that came up as unmatched in the geocoding process, as these only had a listed PLSS address. These steps and tools were used until every address was matched with what was believed to be its corresponding mine. Afterwards, the data was exported as a point shapefile so it could be analysed.
The completed geocoded mines location shapefile was added into a new data frame. The true_mine locations_shapefile was also added to this data frame. Using the Select tool and the a query, only the mines nineteen mines which were a personal responsibility were selected out of the shapefile, using the unique mine ID field. This would allow for the comparison of the personally geocoded locations to what was considered the actual locations. In addition, a merge was completed on all the other students' personal shapefiles for their personally assigned, geocoded mines. These were made available by each student when they completed their geocoding. The list of mines geocoded by each student would have some overlap with others in the class. Thus, they could be compared against one another. Unfortunately, several of the students failed to properly name their mine unique ID field (Mine_Uniqu). To prevent this, a field map was used during the merge in order to correct these errors in naming. In addition, two fields, each in an attribute table of one the the shapefiles, needed to be altered, as they were incorrectly populated with values that prevented the merge (ex: words used to represent a null value for a long integer). Once the merge was completed, the same query and Select tool originally used to find geocoded mines in the true mine locations shapefile matching the mine unique IDs (Mine_Uniqu) of the personally completed nineteens mines was used to find the geocoded results of students who also had completed these nineteen mines. This created a point shapefile out of only these corresponding nineteen mines from the other students' geocoding results.
With shapefiles of the personal geocoded locations, the true mine locations, and the class geocoded mines for the assigned nineteen mines finally ready, they could require formatting before analysis. Each one was reprojected into the into the NAD 1983 State Plane Wisconsin Central FIPS 4802 projected coordinate system. This was required before analysis, as they were originally projected into a geographic coordinate system that used degrees as its unit of measurement for distance and location. By reprojecting them into a projected coordinate system and changing the data frame to this as well, distances between mine locations could be measured instead in linear meters.
The near tool was used to measure the distance of each personally geocoded mine to the closest "actual" mine location. This was usually the corresponding actual location whose Mine Unique ID field matched each of the geocoded mines. However, this was not the case with one mine and its corresponding actual location. In this case, the Measure tool was used to measure the distance between the geocoded location and the actual location. This data was then added into an excel table, and several statistical measures (minimum, maximum, mean, median, standard deviation). This would serve as the distance error data between the geocoded locations and the actual locations. In addition, a similar use of the Near tool was used to gather data on the distance between the geocoded mine locations and the corresponding locations geocoded by peers. The closest corresponding mine location of other students was used and recorded, instead of all corresponding distances, as this would provide a sample of the whole that would likely be indicative of the error between the personal locations and those of others. This would also serve to point out any locations from the true locations shapefile that may actually be incorrect. in addition, many students had not completed the geocoding process in the allotted time. As a result, the sample data that could be gathered from others was limited.One corresponding location for each of the nineteen geocoded mines taken from the other students geocoded locations equaled out to half of the points made available by other students. Several mines had only one corresponding geocoded location completed by other students. In the case of Mine 328, no other student had geocoded this mine's location. Once the distance between each geocoded mine and its closest corresponding geocoded location was collected, it was similarly recorded in a data table as distance error values, with the same statistical measurements being collected. Then, the geocoded mine locations shapefile, the true mine locations shapefile, and the class geocoded locations shapefile of the corresponding nineteen mine locations was used to construct a map to more accurately and efficiently convey distnce between the points.
Results
As seen by the results, the greatest error between the actual location and the geocoded location is
In the case of the small amounts of error, like in Mine 295, this is not a result of mistakes and blunders made by the operator or data analyzer. Instead, this error is a result of a combination of systemic and random error. For these locations, all points were correctly placed on the proper mine. However, their specific location at each mine are different. This is a result of the computer generalizing the location to the exact center of the mine, while each student chose what they believed to be the entrance of the mine as its location, under instruction. This personal bias is what's known as systemic error. Random error, the most minor of all, results from the fact that a human being can not be precise enough to manually place the point at the exact entrance to each mine during the geocoding process. This is because a person can only be so precise with the manual placement of points in geocoding, and the imagery used has a maximum resolution. The bias on placing points and is a type of geographic error resulting from data attribute error, a sources of data automation and compilation error. However, rather than being a sources of operational error (a mistake), it is instead inherent error, which is minor error expected to occur and unavoidable during the process. Finally, the ability to only get so accurate because of human and resolution limitations is inherent image analysis error, or the error which occurs based on the quality of the image and the precision in its analysis.
But what does this all mean for the geocoded mine location data. Points with relatively minor error resulting from inherent random or systemic error are not wrong. This data can be considered correct. However, it is important to remember the source of the error to minimize its concentration later on or to possibly eliminate bias in future data collection. The data that needs to be thrown out or removed is comprised of the points having large amounts of gross operational error. These points are usually drastically off from from their real world numbers. It is critical to avoid using these points in further analysis, as they may lead to a false conclusion that is in actuality far off from what should be supported.
Sources
Hupy, C. (2017) Exercise 6: Data Normalization, Geocoding, and Error Assessment: Sand Mining Suitability Project. Eau Claire, WI.
Mine location data, PLSS townships shapefile, and PLSS Sections shapefile provided by the Wisconsin Department of Natural Resources (2017)


No comments:
Post a Comment