📗Glossary
Last updated
Last updated
Term | |
---|---|
Stable Isotopes
Stable isotopes are elements with the same number of protons but different number of neutrons. Stable Isotopes are useful for timber origin verification because they vary by geography and are absorbed by trees. For example, carbon has two stable isotopes: 12C and 13C. 12C has six electrons, six protons, and six neutrons, and 13C has six electrons, six protons, and seven neutrons.
Isoscape
An isoscape is a map (geotif) of a geographic area where each location (pixel) represents the ratio of an isotope to its more common form.. So a δ18O isoscape would be similar to a topographic map, but instead of showing elevation, would show at each point the ratio between stable isotopes oxygen-18 (18O) and oxygen-16 (16O).
Training Set
The entity for which the input is obtained is a known trusted source (such as Dr. Martinelli). This means that the ‘Input Location’ is known a priori to be ‘Truthful’ and the Isotope metrics are intended to be included in the ‘Trusted Isoscape’.
Validation Set
The entity represents a known trusted source. Samples included in a validation set are used for parameter tuning and to measure and detect overfit when performance on the training set is better than the validation (and test) set..
Test Set
The entity represents a known trusted source, but whose data is not included in training. The samples used for validation in this document will come from a consistent Test set.
Field/DOF Set
An input that is done in our online system against an actual untrusted Supplier or test subject for which we cannot trust the given location.
Simulated
Simulated Isotope values were generated from some other data source, like taking point samples from an already generated isoscape from Craig Gordon.
Field-Sampled
Isotope values were taken by a mass spectrometer from a sample taken directly in the field.
Fraudulent (Positive)
Fraudulent Locations are a location that isn’t the accurate location of the tree whose Isotope values were measured/simulated. Note that the ‘Input Isotope Metric’ could still match (through happenstance) what it should be at the input location. Location is our Label.
non-Fraud (Negative)
Accurate locations that represent the actual original location of the tree whose sample was taken and Isotopes measured (or simulated).
Predicted Location
A location predicted by one of the models.
Known Location
A location provided by a trusted source.
True
The model correctly predicted that the location was either Fraud or not Fraud. A True Positive happens when the Model correctly predicts a Fraudulent Location. A True Negative happens when the Model correctly confirms the stated Location is accurate.
False
The model did not correctly predict that the location was either Fraud or not Fraud A False Positive happens when the Model incorrectly predicts a Fraudulent Location. A False Negative happens when the Model incorrectly confirms the stated Location is accurate.
RMSE
A measure of the average error between predicted (mean isotope) values and observed ground truth values of the isotope ratios.
RMSE = sqrt(mean( (predicted - observed)^2 ))
Precision
Ratio of true positives to total classified positives. True positives are invalid DOFs labeled “invalid”. See: https://en.wikipedia.org/wiki/Precision_and_recall which includes a good picture.
Precision = (number of correctly measured Fraud)/(total measured Fraud)
(True Predicted Positive / Total Predicted Positives)
Recall
Ratio of true positives to total positives. Ie How much fraud we detect. See https://en.wikipedia.org/wiki/Precision_and_recall
Recall = (number of correctly identified invalid DOFs)/(total Fraud in the set)
True Predicted Positives / Total Known Positives
P-value threshold
The p-value is a measure of how likely it is that two groups of observations represent the same distribution. A low p-value in our t-test indicates that two distributions (the ground truth and sample being tested) are dissimilar, which should cause a positive (fraud) result.
As we decrease the p-value threshold, we will see fewer indications of fraud, with the intention of increasing precision at the expense of recall.
Last Known Good Isoscape
The isoscape geotiff per element that is currently considered highest quality and intended for use in higher level analytics and production use cases.
Simulated Fraud Percent
The percentage of the total Test Set that should be simulated as Fraud.
Simulated Radius
The simulated radius controls the extent of fraudulent locations from the ground truth sample location generated by the validation test. We currently simulate fraud from the entire Amazon, but have designed the system to adjust to how real fraud occurs.
Allowed Radius
The allowed radius is a buffer area for which the null hypothesis is confirmed (the sample’s stated location is truthful). This radius allows for a permitted area to match a given timber sample.