Vision Sensor Networks Lab -- Winter 2007
EE392Y: Vision Sensor Networks Lab, Winter '07
Click here for a description of course projects in Winter 2006
- A Multi-Touch Surface Using Multiple Cameras
- Kevin Gabayan
- Itai Katz
- A Multi-Touch Surface Using Multiple Cameras, K. Gabayan, I. Katz, H. Aghajan, Advanced Concepts for Intelligent Vision Systems (ACIVS) August 2007.
- A LQR Spatiotemporal Fusion Technique for Camera Selection with the Best
View in Smart Camera Surveillance
- Chung-Ching Chang
- Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, C. Chang and H. Aghajan, Advanced Concepts for Intelligent Vision Systems (ACIVS) August 2007.
- A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smart Camera Surveillance, C. Chang and H. Aghajan, Advanced Video and Signal based Surveillance (AVSS), Sept. 2007.
- Selection of Camera with the Best View in a Multi-Camera Environment
- Olutosin Olatunbosun
- Faith Davis
- Probabilistic Models of Hand Gesture Contour Prediction in Multi-Camera Network
- Jia-Yu Chen
- Head Tracking and Face Detection for Cognitive Vision
- Sudeepto Chakraborty
- Role Selection in Distributed Vision Systems
- Jingshen Jimmy Zhang
In this project we present a framework for a multi-touch surface using multiple cameras. With an overhead camera and side-mounted camera we can determine the x-y coordinates of the fingertips and detect touch events. We interpret these events as hand gestures which can be generalized into commands for manipulating applications. We offer an example application of a multi-touch finger painting program.
In this project, we propose a joint face orientation estimation for camera selection for the best view in smart camera networks without having to localize the cameras in advance. The system is composed of in-node coarse estimation and joint refined estimation between cameras. In-node signal processing algorithms are designed intentionally simple and general to reduce computation required, yielding coarse estimates which may be erroneous. The proposed model-based technique determines the orientation and the angular motion of the face using two features, namely the hair-face ratio and the head optical flow. These features yield an estimate of the face orientation and the angular velocity through Least Squares (LS) analysis. In the joint refined estimation, a discrete-time linear dynamical system is first modeled. The spatiotemporal consistency between cameras is measured by a cost function, a weighted quadratic sum of spatial inconsistency, input energy, and in-node estimation error. Minimizing the cost function through Linear Quadratic Regulation (LQR) provides a robust close-loop feedback system that successfully estimate the face orientation, angular motion, and relative angular difference to the face between cameras. Based on the face orientation estimates, the camera with the best view can be selected according to the policy, which is determined by the related application.
Many computer vision algorithms, like face detection and face tracking, assume the face is centered within the frame. This is not a valid assumption in general and particularly when the location of the subject and camera are unknown a priori. This report investigates the problem of selecting the camera with the best view in a multi-camera environment. An algorithm is developed which takes the frames obtained from an array of cameras and selects the camera with the best view. The algorithm hinges on the eye index and the skin index, which are combined to form a weighted composite metric, which reflects the relative centering of the face.
Hand gesture recognition is widely investigated in the pattern recognition area. Various types of probabilistic models are applied to construct the structures of gestures. On the other side, the pattern recognition problems involved with multi sensor network also draw attention these years. Focusing on the combination of the two issues, in this paper, we propose novel probabilistic approaches to model the dependencies in hand gesture contour, both in spatial and temporal domain. In discrete models, the self-organizing map (SOM) is used as connection between the feature vectors and probabilistic model. In addition, continuous models are used to predict current motion vectors according to previous frames by an MMSE (minimum mean square error)-like approach. Experiments show that the continuous models have better visual results than discrete models.
In this report a cognitive vision system is discussed that tracks the movement of a person�s head, detects face visibility and will eventually use cognitive memory for face recognition. Dynamic contour tracking is used to perform head tracking and skin color segmentation for face detection. Selected images containing just the face will then be used for either training the neural networks in the cognitive memory system, or for retrieval of patterns stored in long-term memory for face recognition. This project report extends these ideas to multi-camera network systems where algorithms are necessary that effectively exploit the spatiotemporal nature of the data.
In light of the changing paradigm for distributed vision systems where networks of cameras will continue to have more and more capabilities like infrared detection and mobility in degrees of freedom and space, we present an algorithm for helping each camera choose what processing role to take by collaborating with other cameras in the network as well as providing an facile way to tailor the distribution of roles for specific tasks. Initial tests show advantages over algorithms that simply choose the best camera view and discards others.