Abstract
This appendix provides analysis of several common detectors against the synthetic feature alphabets described in Chapter 7. The complete source code, shell scripts, and the alphabet image sets are available from Springer Apress at: http://www.apress.com/source-code/ComputerVisionMetrics
You have full access to this open access chapter, Download chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This appendix provides analysis of several common detectors against the synthetic feature alphabets described in Chapter 7. The complete source code, shell scripts, and the alphabet image sets are available from Springer Apress at: http://www.apress.com/source-code/ComputerVisionMetrics
This appendix contains:
-
Background on the analysis, methodology, goals, and expectations.
-
Synthetic alphabet ground truth image summary.
-
List of detector parameters used for standard OpenCV methods: SIFT, SURF, BRISK, FAST, HARRIS, GFFT, MSER, ORB, STAR, SIMPLEBLOB. Note: No feature descriptors are computed or used, only the detector portions of BRISK, SURF, SIFT, ORB, and STAR are used in the analysis.
-
Test 1: Interest point alphabets.
-
Test 2: Corner point alphabets.
-
Test 3: Synthetic alphabet overlays onto real images.
-
Test 4: Rotational invariance of detectors against synthetic alphabets.
Background Goals and Expectations
The main goals for the analysis are:
-
To develop some simple intuition about human vs. machine detection of interest point and corner detectors, to observe detector behavior on the synthetic alphabets, and to develop some understanding of the problems involved in designing and tuning feature detectors.
-
To measure detector anomalies among white, black, and gray versions of the alphabets. A human would recognize the same pattern easily whether or not the background and foreground are changed; however, detector design and parameter settings influence detector invariance to background and foreground polarity.
-
To measure detector sensitivity to slight pixel interpolation artifacts under rotation.
Note
Experienced practitioners with well-developed intuition regarding capabilities of interest point and corner detector methods may not find any surprises in this analysis.
The analysis uses several well-known detector methods as implemented in the OpenCV library; see Table A-1. The analysis provides detector information only, with no intention to compare detector goodness against any criteria. Details on which features from the synthetic alphabets are recognized by the various detectors is shown in summary tables, counting the number of times a feature is detected with each grid cell. For some applications, the synthetic interest point alphabet approach could be useful, assuming that an application-specific alphabet is designed, and detectors are designed and tuned for the application, such as a factory inspection application to identify manufactured objects or parts.
Test Methodology and Results
The images in the ground truth data set are used as input for a few modified OpenCV tests:
-
opencv_test_features2d
(BRISK, FAST, HARRIS, GFFT, MSER, ORB, STAR, SIMPLEBLOB)
-
opencv_test_nonfree
(SURF, SIFT)
The tuning parameters used for each detector are shown in Table A-1; see the OpenCV documentation for more information. Note: no attempt is made to tune the detector parameters for the synthetic alphabets. Parameter settings are reasonable defaults; however, the maximum keypoint feature count is bumped up in some cases to allow all the detected features to be recorded.
Each test produces a variety of results, including:
-
1.
Annotated images showing location and orientation (if provided) for detected features.
-
2.
Summary count of each detected synthetic feature across the grid in text files, including interest point coordinates, detector response strength, orientation if provided by the detector, and the number of total detected synthetic features found.
-
3.
2D histograms showing bin count for each feature in the alphabet.
Detector Parameters Are Not Tuned for the Synthetic Alphabets
No feature detector tuning is attempted here. Why? In summary, feature detector tuning has very limited value in the absence of (1) a specific feature descriptor to use the keypoints, and (2) an intended application and use-cases. Some objections may be raised to this approach, since detectors are designed to be tuned and must be tuned to get best results for real applications. However, the test results herein are only a starting point, intended to allow for simple observations of detector behavior compared to human expectations.
In some cases, a keypoint is not suitable for producing a useful feature descriptor, even if the keypoint has a high score and high response. If the feature descriptor computed at the keypoint produces a descriptor that is too weak, the keypoint and corresponding descriptor should both be rejected. Each detector is designed to be useful for a different class of interest points, and tuned accordingly to filter the results down to a useful set of good candidates for a specific feature extractor.
Since we are not dealing with any specific feature descriptor methods here, tuning the keypoint detectors has limited value, since detector parameter tuning in the absence of a specific feature description is ambiguous. Furthermore, detector tuning will be different for each detector-descriptor pair, different for each application, and potentially different for each image.
Tuning detectors is not simple. Each detector has different parameters to tune for best results on a given image, and each image presents different challenges for lighting, contrast, and image pre-processing. For typical applications, detected keypoints are culled and discarded based on some filtering criteria. OpenCV provides several novel methods for tuning detectors, however none are used here. The OpenCV tuning methods include:
-
DynamicAdaptedFeatureDetectorclass will tune supported detectors using an adjusterAdapter() to only keep a limited number of features, and to iterate the detector parameters several times and re-detect features in order to try and find the best parameters, keeping only the requested number of best features. Several OpenCV detectors have an adjusterAdapter() provided while some do not, and the API allows for adjusters to be created.
-
AdjusterAdapterclass implements the criteria for culling and keeping interest points. Criteria may include KNN nearest matching, detector response or strength, radius distance to nearest other detected points, removing keypoints for which a descriptor cannot be computed, or other.
-
PyramidAdaptedFeatureDetectorclass is can be used to adapt detectors that do not use a scale-space pyramid, and this adapter will create a Gaussian pyramid and detect features over the pyramid.
-
GridAdaptedFeatureDetectorclass divides an image into grids, and adapts the detector to find the best features within each grid cell.
Expectationsfor Test Results
The reader should treat these tests as information only to develop intuition about feature detection. The test results do not prove the merits of any detector. Interpretation of the test results should be done with the following information in mind:
-
1.
One set of detector tuning parameters is used for all images, and detector results will vary widely based on tuning parameters. In fact, the parameters are deliberately set to over-sensitive values for ORB, SURF, and other detectors to generate the maximum number of possible keypoints that can be found.
-
2.
Sometimes an alphabet feature generates multiple detections; for example, a single corner alphabet feature may actually contain several corner features.
-
3.
The detection results may not be repeatable over the distribution of replicated features in the image feature grid. In other words, identical patterns, which look about the same to a human, are sometimes not recognized at different locations. Without looking in detail at each algorithm, it is hard to say what is happening.
-
4.
Detectors that use an image pyramid such as SIFT, SURF, ORB, STAR, and BRISK may identify keypoints in a scale space that are offset or in between the actual alphabet features. This is expected, since the detector is using features from multiple scales.
Summary of Synthetic Alphabet Ground Truth Images
The ground truth dataset is summarized here. Note that rotated versions of each image file in the set are provided from 0 to 90 degrees at 10-degree intervals. The 0-degree image in each set is 1024x1024 pixels, and the rotated images in each set are slightly larger to contain the entire rotated 1024x1024 pixel grid.
Synthetic Interest Point Alphabet
The synthetic interest point alphabet contains multiples of the 83 unique patterns, as shown in Figure A-2. A total of 7 x 7 sets of the 83 features fit within the 1024 x 1024 image. Total unique feature count for the image is 7 x 7 x 83 = 4116, with 7 x 7 = 49 instances of each feature. The features are laid out on a 14x14 pixel grid composed of 10 rows and 10 columns, including several empty grid locations. Gray image pixel values are 0x40 and 0xc0, black and white pixel values are 0x0 and 0xff.
Synthetic Corner Point Alphabet
The synthetic corner point alphabet contains multiples of the 63 unique patterns, as shown in Figure A-3. A total of 8 x 12 sets of the 63 features fit within the 1024x1024 image. Total unique feature count is 8 x 12 x 63 = 6048, with 8 x 12 = 96 instances of each feature. Each feature is arranged on a grid of 14 x 14 pixel rectangles, including 9 rows and 6 columns of features. Gray image pixel values are 0x40 and 0xc0, black and white pixel values are 0x0 and 0xff.
Synthetic Alphabet Overlays
A set of images with the synthetic alphabets overlaid is provided, including rotated versions of each image, as shown in Figure A-4.
Test 1: Synthetic Interest Point Alphabet Detection
Table A-2 provides the total detected synthetic interest points. Note: total detector counts include features computed at each scale of an image pyramid. For detectors, which report feature detections at each level of an image pyramid, individual pyramid level detections are shown in Table A-3.
The total number of features detected in each alphabet cell is provided in summary tables from the annotated images. Note that several features may be detected within each 14x14 cell, and the detectors often provide non-repeatable results, which are discussed at the end of this appendix. The counts show the total number of alphabet features detected across the entire image, as shown in Figure A-5.
Annotated Synthetic Interest Point Detector Results
For ORB and SURF detectors, the annotated renderings using the drawkeypoints() function are too dense to be useful for visualization, but are included in the online test results.
The diameter of the circle drawn at each detected keypoint corresponds to the “diameter of the meaningful keypoint neighborhood,” according to the OpenCV KeyPoint class definition, which varies in size according to the image pyramid level where the feature was detected. Some detectors do not use a pyramid, so the diameter is always the same. The position of the detected features is normalized to the full resolution image, and all detected keypoints are drawn.
Entire Images Available Online
To better understand the detector results for each test, the entire image should be viewed to see the anomalies, such as where detectors fail to recognize identical patterns. Figure A-5 is an entire image showing BRISK detector results, while others are available online. Test results shown in Figures A-6 through A-15 only show a portion of the images.
Test 2: Synthetic Corner Point Alphabet Detection
Table A-4 provides the total detected synthetic corner points at all pyramid levels; some detectors do not use pyramids. Note: for detectors that report features separately over image pyramid levels, individual pyramid-level detections are shown in Table A-5.
Each feature exists within a 14x14 pixel region, and the total number of features detected in each cell is provided in summary tables with the annotated images. Note that several features may be detected within each 14 x 14 cell, and the detectors often provide non-repeatable results, which are discussed at the end of this appendix.
Annotated Synthetic Corner Point Detector Results
Test 2 is exactly like the interest point detector results in Test 1. As such, for ORB and SURF detectors, the annotated renderings using the drawkeypoints( ) function are too dense to be useful, but are included in the online test results.
The diameter of the circle drawn at each detected keypoint corresponds to the “diameter of the meaningful keypoint neighborhood,” according to the OpenCV KeyPoint class definition, which varies in size according to the image pyramid level where the feature was detected. Some detectors do not use a pyramid, so the diameter is always the same. The position of the detected features is normalized to the full resolution image, and all detected keypoints are drawn.
Entire Images Available Online
To better understand the detector results for each test, the entire image should be viewed to see the anomalies, such as where detectors fail to recognize identical patterns. Test results shown in Figures A-16 through A-25 only show a portion of the images.
Test 3: Synthetic Alphabets Overlaid on Real Images
Table A-6 provides the total detected synthetic features found in the test images of little girls, shown in Figure A-3. Note that only the 0-degree version is used (no rotations), and both the black versions and the white versions of each alphabet are overlaid. In general, the white feature overlays produce more interest points and corner-point detections.
Annotated Detector Results on Overlay Images
Annotated images are available online.
Test 4: Rotational Invariance for Each Alphabet
This section provides results showing detector response as rotational invariance across the full 0 to 90 degree rotated image sets of black, white, and gray alphabets. Key observations:
-
Black on white, white on black: Rotational invariance is generally less using black and white images with the current set of detectors and parameters, mainly owing to (1) the maxima and minima values of 0x0 and 0xff used for pixel values, and (2) un-optimized detector tuning parameters. The detectors each seem to operate in a similar manner on images at orientations of 0 degrees and 90 degrees that contain no rotational anti-aliasing artifacts on each alphabet pattern; however, for the other rotations of 10 to 80 degrees, pixel artifacts combine to reduce rotational invariance for these alphabet patterns—each detector behaves differently.
-
Light gray on dark gray: Rotational invariance is generally better for the detectors using the reduced-range gray scale image alphabet sets using pixel values of 0x40 and 0xc0, rather than the full maxima and minima range used in the black and white image sets. The gray alphabet detector results generally show the most well-recognized alphabet characters under rotation. This may be due to the less pronounced local curvature of closer range gray values in the local region at the interest point or corner.
Methodology for Determining Rotational Invariance
The methodology for determining rotational invariance is illustrated in Figures A-26 through A-30, and illustrated via pseudo-code as follows:
For (degree = 0; degree < 100; degree += 10)
Rotate image (degree)
For each detector (SURF, SIFT, BRISK, ...):
Compute interest point locations
Annotate rotated image showing interest point locations
Compute bin count (# of times) each alphabet feature is detected
Create bin count image: pixel value = bin count for each alphabet character
Figures A-26 and A-30 show the summary bin counts of synthetic corner point detections across 0 to 90 degree rotations. The ten columns in each image show, left to right, the 0 to 90 degree rotated image final bin counts displayed as images.
Analysis of Results and Non-Repeatability Anomalies
Complete analysis results are online, including annotated images showing detected keypoint locations and text files containing summary information on each detected keypoint.
Caveats
There are deliberate reasons why each interest point detector is designed differently; no detector may be considered superior in all cases by any absolute measure. A few arguments against loosely interpreting these tests results are as follows:
-
1.
Unpredictability: Interest point detectors find features that are often unpredictable from the human visual system standpoint, and they are not restricted by design into the narrow boundaries of synthetic interest points and corners points shown here. Often, the interest point detectors find features that a human would not choose.
-
2.
Pixel aliasing artifacts: The aliasing artifacts affect detection and are most pronounced for the rotated images using maxima and minima alphabets, such as black on white or white on black, and are less pronounced for light gray on dark gray alphabets.
-
3.
Scale Space: Not all the detectors use scale space, and this is a critical point. For example, SIFT, SURF, and ORB use a scale-space pyramid in the detection process. The scale-space approach filters out synthetic alphabet features that are not visible in some levels of a scale-space pyramid.
-
4.
Binary vs. scalar values: FAST uses a binary value comparison to build up the descriptor, while other methods use scalar values such as gradients. Binary value methods, such as FAST, will detect the same feature regardless of polarity or gray value range; however, scalar detectors based on gradients are more sensitive to pixel value polarity and pixel value ranges.
-
5.
Pixel region size: FAST uses a 7x7 patch to look for connected circle perimeter regions, while other features like SIFT, SURF, and ORB use larger pixel regions that bleed across alphabet grid cells, resulting in interest points being centered between alphabet features, rather than on them.
-
6.
Region shape: Features such as MSER and SIMPLEBLOB are designed to detect larger connected regions with no specific shape, rather than smaller local features such as the interest point alphabets. An affine-invariant detector, such as SIFT, may detect features in an oval or oblong region corresponding to affine scale and rotation transformations, while a non-affine detector, such as FAST, may only detect the same feature as a template in a circular or square region with some rotational invariance at scale.
-
7.
Offset regions from image boundary: Some detectors, such as ORB, SURF, and SIFT, begin detector computations at an offset from the image boundaries, so features are not computed across the entire image.
-
8.
Proven value: Each detector method used here has proved useful and valuable for real applications.
With these caveats in mind, the test results can be allowed to speak for themselves.
Non-Repeatability in Tests 1 and 2
One interesting anomaly visible in Tests 1 and 2 appears in the annotated images, illustrating that detector results are not repeatable on the synthetic interest point and corner alphabets. In some cases, the nonlinearity is striking; see the annotated images for Tests 1 and 2. The expectation of a human is that identical interest points should be equally well recognized. Here are some observations:
-
1.
A human would recognize the same pattern easily whether or not the background and foreground are changed; however, some detectors do not have much invariance to extreme background and foreground polarity. The anomalies between detector behavior across white, black, and gray versions of the alphabets are less expected and harder to explain without looking deeper into each algorithm.
-
2.
Some detectors compute over larger region boundaries than the 14x14 alphabet grid, so detectors virtually ignore the alphabet feature grid and use adjacent pieces of alphabet features.
-
3.
Some detectors use scale space, so individual alphabet features are missed in some cases at higher scale levels, and detectors such as SIFT DoG use multiple scales together.
In summary, interest point detection and parameter tuning are analogous to image processing operators and their parameters: there are endless variations available to achieve the same goals. It is hoped that, by studying the test results here, intuition will be increased and new approaches can be devised.
Other Non-Repeatability in Test 3
We note non-repeatability anomalies with Test 3 using little girl images with synthetic overlays, but there is less expectation of repeatability in this test. Some analysis of the differences between the positive (white) and negative (black) feature overlays can be observed in the annotated synthetic overlay images online.
Test Summary
Take-away analysis for all tests includes the following:
-
1.
Non-repeatability: some non-repeatability anomalies detecting nearly identical features, differeing only under rotation by local pixel interpolation artifacts. Some detectors also detect the black, white and gray alphabets differently.
-
2.
Gray level alphabets (lt. gray on dk.gray) are detected generally most similar to human expectations. The results show that detectors, with the current tuning parameters, respond more uniformly across rotation with gray level patterns, rather than maxima black and white patterns.
-
3.
Real images overlaid with synthetic images tests provide interesting information to develop intuition about detector behavior—for illustration purposes only.
Future Work
Additional analysis should include devising and using alternative alphabets suited for a given type of application, including a larger range of pixel sizes and scales, especially alphabets with closer gray level value polarity, rather than extreme maxima and minima pixel values. Detector tuning should also be explored across the alphabets.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this chapter or parts of it.
The images or other third party material in this chapter are included in the chapter’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2014 Scott Krig
About this chapter
Cite this chapter
Krig, S. (2014). Synthetic Feature Analysis. In: Computer Vision Metrics. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-5930-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-4302-5930-5_9
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4302-5929-9
Online ISBN: 978-1-4302-5930-5
eBook Packages: Professional and Applied ComputingProfessional and Applied Computing (R0)Apress Access Books