The method comprising providing a plurality of images of a scene captured by a plurality of image capturing devices (101); providing silhouette information of at least one object in the scene (102); generating a point cloud for the scene in 3D space using the plurality of images (103); extracting an object point cloud from the generated point cloud, the object point cloud being a point cloud associated with the at least one object in the scene (104); estimating a 3D shape volume of the at least one object from the silhouette information (105); and combining the object point cloud and the shape volume of the at least one object to generate a three-dimensional model (106). An apparatus for generating a 3D model, and a computer readable medium for generating the 3D model.