We present Forge, a volumetric video processing platform runningon the cloud that works with different camera configurations andadapts to the user needs. From high-end studios with dozens of syn-chronised cameras to scenes casually captured with smartphones,Forge allows creators to produce high-quality 3D human contentautonomously with the equipment they already have, in an afford-able way, unlocking new and accessible ways to create 3D humancontent at scale.
This demo paper describes a project that engages cutting-edge free viewpoint video (FVV) techniques for developing content for an augmented reality prototype. The article traces the evolutionary process from concept, through narrative development, to completed AR prototypes for the HoloLens and handheld mobile devices. It concludes with some reflections on the affordances of the various hardware formats and posits future directions for the research.
This poster describes a reinterpretation of Samuel Beckett’s theatrical text Play for virtual reality (VR). It is an aesthetic reflection on practice that follows up an a technical project description submitted to ISMAR 2017 [O’Dwyer et al. 2017]. Actors are captured in a green screen environment using free-viewpoint video (FVV) techniques, and the scene is built in a game engine, complete with binaural spatial audio and six degrees of freedom of movement. The project explores how ludic qualities in the original text help elicit the conversational and interactive specificities of the digital medium. The work affirms the potential for interactive narrative in VR, opens new experiences of the text, and highlights the reorganisation of the author-audience dynamic.
Since the early years of the twenty-first century, the performing arts have been party to an increasing number of digital media projects bring renewed attention to questions about, on one hand, new working processes involving capture and distribution techniques, and on the other hand, how particular works – with bespoke hard and software – can exert an efficacy over how work is created by the artist/producer or received by the audience. The evolution of author/audience criteria demand that digital arts practice modify aesthetic and storytelling strategies, to types that are more appropriate to communicating ideas over interactive digital networks, wherein AR/VR technologies are rapidly becoming the dominant interface. This project explores these redefined criteria through a reimagining of Samuel Becketts Play (1963) for digital culture. This paper offers an account of the working processes, the aesthetic and technical considerations that guide artistic decisions and how we attempt to place the overall work in the state of the art.
This paper addresses the problem of detecting and segmenting human instances in a point cloud. Both fields have been well studied during the last decades showing impressive results, not only in accuracy but also in computational performance. With the rapid use of depth sensors, a resurgent need for improving existing state-of-the-art algorithms, integrating depth information as an additional constraint became more ostensible. Current challenges involve combining RGB and depth information for reasoning about location and spatial extend of the object of interest. We make use of an improved deformable part model algorithm, allowing to deform the individual parts across multiple scales, approximating the location of the person in the scene and a conditional random field energy function for specifying the object’s spatial extent. Our proposed energy function models up to pairwise relations defined in the RGBD domain, enforcing label consistency for regions sharing similar unary and pairwise measurements. Experimental results show that our proposed energy function provides a fairly precise segmentation even when the resulting detection box is imprecise. Reasoning about the detection algorithm could potentially enhance the quality of the detection box allowing capturing the object of interest as a whole.
Deformable Part Models, RGBD Data, Conditional Random Fields, Graph Cuts, Human Recognition
Human monitoring and tracking has been a prominent research area for many scientists around the globe. Several algorithms have been introduced and improved over the years, eliminating false positives and enhancing monitoring quality. While the majority of approaches are restricted to the 2D and 2.5D domain, 3D still remains an unexplored field. Microsoft Kinect is a low cost commodity sensor extensively used by the industry and research community for several indoor applications. Within this framework, an accurate and fast-to-implement pipeline is introduced working in two main directions: pure 3D foreground extraction of moving people in the scene and interpretation of the human movement using an ellipsoid as a mathematical reference model. The proposed work is part of an industrial transportation research project whose aim is to monitor the behavior of people and make a distinction between normal and abnormal behaviors in public train wagons. Ground truth was generated by the OpenNI human skeleton tracker and used for evaluating the performance of the proposed method.
Calibration, Bundle Adjustment, 3D Object Extraction, Ellipsoid, 3D Human, Tracking
Human Tracking in Computer Vision is a very active up-going research area. Previous works analyze this topic by applying algorithms and features extraction in 2D, while 3D tracking is quite an unexplored filed, especially concerning multi–camera systems. Our approach discussed in this paper is focused on the detection and tracking of human postures using multiple RGB–D data together with stereo cameras. We use low–cost devices, such as Microsoft Kinect and a people counter, based on a stereo system. The novelty of our technique concerns the synchronization of multiple devices and the determination of their exterior and relative orientation in space, based on a common world coordinate system. Furthermore, this is used for applying Bundle Adjustment to obtain a unique 3D scene, which is then used as a starting point for the detection and tracking of humans and extract significant metrics from the datasets acquired. In this article, the approaches are described for the determination of the exterior and absolute orientation. Subsequently, it is shown how a common point cloud is formed. Finally, some results for object detection and tracking, based on 3D point clouds, are presented.
Multi Camera System, Bundle Adjustment, 3D Similarity Transformation, 3D Fused Human Cloud
Human detection and tracking has been a prominent research area for several scientists around the globe. State of the art algorithms have been implemented, refined and accelerated to significantly improve the detection rate and eliminate false positives. While 2D approaches are well investigated, 3D human detection and tracking is still an unexplored research field. In both 2D/3D cases, introducing a multi camera system could vastly expand the accuracy and confidence of the tracking process. Within this work, a quality evaluation is performed on a multi RGB-D camera indoor tracking system for examining how camera calibration and pose can affect the quality of human tracks in the scene, independently from the detection and tracking approach used.
After performing a calibration step on every Kinect sensor, state of the art single camera pose estimators were evaluated for checking how good the quality of the poses is estimated using planar objects such as an ordinate chessboard. With this information, a bundle block adjustment and ICP were performed for verifying the accuracy of the single pose estimators in a multi camera configuration system. Results have shown that single camera estimators provide high accuracy results of less than half a pixel forcing the bundle to converge after very few iterations. In relation to ICP, relative information between cloud pairs is more or less preserved giving a low score of fitting between concatenated pairs. Finally, sensor calibration proved to be an essential step for achieving maximum accuracy in the generated point clouds, and therefore in the accuracy of the produced 3D trajectories, from each sensor.
Calibration, Single Camera Orientation, Multi Camera System, Bundle Block Adjustment, ICP, 3D human tracking
Human extraction and tracking is an undergoing field were many researchers have been working on for more than 20 years. Although several approaches in the 2D domain have been introduced, 3D literature is limited, requiring further investigation. Within this framework, an accurate and fast-to-implement pipeline is introduced working in two main directions: pure 3D foreground extraction of moving people in the scene and interpretation of the human movement using an ellipsoid as a mathematical reference model. The proposed work is part of an industrial transportation research project whose aim is to monitor the behaviour of people and make a distinction between normal and abnormal behaviours in public train wagons using a network of low cost commodity sensors such as Microsor Kinect sensor.
Die Erkennung und Verfolgung von Menschen mit Kamerasystemen ist ein sehr interessantes und sich schnell entwickelndes Forschungsgebiet und spielt gerade für die Sicherheitsforschung eine große Rolle. Bisherige Arbeiten konzentrieren sich auf 2D-Algorithmen, wobei die Erkennung, Extraktion und Verfolgung in 3D ein noch ziemlich unerforschtes Gebiet, vor allem in Bezug auf Multi-Kamera-Systeme ist. Unser Ansatz konzentriert sich auf die Erkennung und Verfolgung von Personen im öffentlichen Nahverkehr aus den Daten mehrerer Stereo und RGB-D-Systeme (RGB-D bezeichnet die Kombination aus Grau-/Farb- und Distanzinformationen, wie z. B. bei der Microsoft Kinect). Wesentliche Punkte des hier beschriebenen Ansatzes beziehen sich auf die Synchronisierung mehrerer Aufnahmesysteme und die Bestimmung ihrer Orientierungsparameter im Raum. Darüber hinaus wird mit Hilfe eines Bündelblockausgleichs geometrisch eine einheitliche 3D-Szene erzeugt, die dann einen Ausgangspunkt für die Erkennung und Verfolgung von Menschen im Beobachtungsraum bildet. Dazu werden signifikante Kennzahlen aus den erfassten Datensätzen ermittelt. In dem Beitrag wird eine Übersicht über die von mehreren RGB-D und Stereosensoren erzeugten Punktwolken und daraus abgeleiteten Daten erläutert und diskutierted.