MANTIS Ph.D. Candidate Bradley Koskowich successfully defended his dissertation, titled “An Assessment of Methods for Effective Single Camera Resection Solutions to the Cross-view Geo-localization Problem,” on Nov. 5, 2024. Bradley’s research focused on blending remote sensing products, platforms, and digital reality tools with AI techniques to connect the physical world directly with data. Bradley has developed several full-stack software applications for the Conrad Blucher Institute over the years.
The abstract of his presentation read as follows:
Typical multi-view stereo (MVS) photogrammetry problems have both traditional and deep learning solutions which utilize collections of overlapping imagery to solve for multiple camera positions simultaneously. Structure-from-motion (SfM) workflows achieve this using bundle adjustments, while simultaneous-localization-and-mapping (SLAM) solutions use a similar, pipelined adjustment method. More recent deep learning research such as neural radiance fields and gaussian splatting can also enhance typical MVS photogrammetry results, but all approaches still lean on a crucial operation, which is accurate camera position and orientation estimation, also called camera pose. Camera pose information can be collected via external hardware such as the global navigation satellite system (GNSS) and inertial motion units (IMU), derived in a post-processing phase from known ground control points, or estimated in a relative fashion between images. Anything other than relative estimation generally introduces additional cost, complexity, and potential points of failure which can render collected pose information useless. This dissertation addresses the challenges of using only computer vision to accurately compute camera pose independent of typical recording systems, focusing on the specific photogrammetry sub-problem of determining camera pose between single image pairs: one georeferenced aerial image and one terrestrial perspective image with unknown priors. Also called monoplotting, single camera resectioning, or cross-view geo-localization, it is technically a simpler camera configuration to solve than MVS photogrammetry, but it lacks the information density MVS photogrammetry methods usually leverage and is extremely sensitive to initial conditions, making it difficult to solve automatically.
In this dissertation, potential applications that can be built atop accurate monoplotting solutions are demonstrated and enhancements to both algorithmic methods and deep learning architectures for solving the monoplotting problem are explored. First, a practical application demonstrates monitoring vehicular traffic in a parking lot from an existing security camera installation in real time, powered by monoplotting. This practical application also illustrates the extreme sensitivity to initial conditions. Second, an algorithmic approach with a purpose-built feature matching method supported by GPU-accelerated feature extraction and data processing was developed and tested across a variety of environments to gauge its ability to mitigate sensitivity to initial conditions. Finally, insight into the behaviors of deep learning architectures which can partially solve the monoplotting problem was obtained by investigating the effects of replacing dense training collections of georeferenced & pose-tracked terrestrial imagery with historical aerial image collections, achieving comparable or better results with fractional training data compared to prior studies. A hybrid approach that combines deep learning for partial initialization with the algorithmic method is proposed, using less training data to improve computed pose accuracy in full 3D space. The broader impact of this research could allow systems that rely on camera pose estimation to do so in a way that provides it as validation or recovery mechanism independent of typical GNSS/IMU systems in the event of catastrophic failure.