Keywords

1 Introduction

With the recent availability of stereoscopic displays such as 3D monitor, 3D television and stereo camera phone, there is an increasing need for stereo image retargeting techniques. Stereo image retargeting aims to resize a stereo image pair to fit various stereoscopic displays of different aspect ratios and sizes. An ideal stereo image retargeting method should be able to (1) ensure scene consistency, (2) preserve geometric/structural details, and (3) preserve stereoscopic properties between the left and right image. More specifically, scene consistency properties [1, 2] include zero object distortion, correct scene occlusion and correct semantic connectedness (consistent physical contacts between objects with their environment in the retargeted image).

Retargeting using simple operators such as scaling and crop** [3] do not guarantee protection of important objects, leading to distortion of objects or loss of important content. Therefore, content-aware stereo retargeting approaches, which can be categorized into discrete, continuous and hybrid approaches, are gaining more research attention. Discrete approaches [4, 5] can produce impressive results for stereo images with simple background but due to its discontinuous nature, object distortion and artifacts are unavoidable for images with complex background. Continuous approaches [6, 7] utilize war** to retarget stereoscopic images. Due to their continuous nature, these approaches can preserve semantic connectedness well but distortion to objects and structural details are unavoidable in extreme retargeting cases. Lee et al. [8] propose a hybrid retargeting approach that segments out the objects into object layers, performs war** on the background layer and paste the objects back to the warped background to produce the retargeted image. This approach ensures objects protection and better preserves structural details but does not guarantee semantic connectedness between a segmented object and its background.

To better preserve scene consistency while minimizing artifacts and preserving stereoscopic properties, we propose a novel stereo image retargeting technique based on the tearable image war** technique [2]. Conceptually, in tearable image war**, an object boundary is divided into tearable and non-tearable segments. Tearable segments correspond to where depth discontinuity occurs, and these segments are allowed to break away from its environment, thus allowing the background war** to be distributed more evenly, leading to reduced distortion of structural details. Non-tearable segments correspond to the object boundary that has actual physical contacts with the environment or other objects. These segments help to preserve semantic connectedness by constraining the object to maintain the real contacts in the 3D world. This approach is able to preserve semantic connectedness very well. However, it is designed for single image and thus, unable to preserve the stereoscopic properties of a retargeted stereo image.

The main contribution of this work is to extend the tearable image war** method [2] for stereo image retargeting. The proposed method retargets both the left and right image of the stereo image pair simultaneously to preserve scene consistency, and minimize distortion, through a revised-optimization algorithm. In addition, it successfully maintains consistent stereoscopic properties of the resulting stereo image pair to avoid visual discomfort during viewing. Besides being able to maintain stereoscopic consistency and guarantees semantic connectedness, experimental results show that our approach can preserve structural details better than stereoscopic seam carving, preserve the global image context better than stereo crop** and protect objects better than traditional stereoscopic war**.

2 Related Works

State-of-the-art content-aware, stereo image retargeting can be categorized into discrete, continuous and hybrid approaches. Discrete methods include scene carving [1] and patch-based [5] approaches. As with its 2D counterpart, the geometrically consistent seam carving [4] approach for stereo image retargeting produces good results for non-complex images but fails in complex images and extreme retargeting cases. The shift-map [5] based approach characterizes geometric rearrangement of a stereo image pair. It can preserve the foreground objects well but does not consider the preservation of semantics features like ripples, shadow, symmetry, texture, etc.

Continuous approaches generally utilize war** to retarget a stereo image pair. Niu et al. [7] extend traditional image war** to stereo images with the objective to preserve prominent objects and 3D structure. Chang et al. [6] war**-based approach aims to avoid diplopia (double vision) by optimizing two stereoscopic constraints; vertical alignment for avoiding vertical artifact and horizontal disparity consistency. However, these war**-based approaches cannot avoid object distortion in cases of extreme image retargeting. In addition, the stereoscopic quality of the retargeted images could be reduced because the whole image is warped in a continuous manner, thus disallowing proper occlusions and depth discontinuity to be created.

To address the limitations of the above war** based approaches, Lee et al. [8] proposes scene war**, a hybrid approach that decomposes the input stereo image into several layers according to the depth order and each layer is warped according to its own mesh deformation. The warped layers were then composited together according to depth order to get the retargeted image. This method produces better stereoscopic quality and ensures object protection, but it cannot guarantee the preservation of the semantic connectedness between a foreground object and its background environment, such as shadow and ripples.

3 Our Approach

Our approach extends the tearable warp algorithm [2] for single image retargeting to the stereoscopic domain. The challenge lies in maintaining the stereoscopic properties of the retargeted stereo image pair and ensuring that the object extraction process for both the left and right image is trivial and coherent. Figure 1 illustrates an overview of our approach. Similar to scene war** approach [8], in our tearable warp based approach, an input stereo image pair is first separated into the background and object layers. Then, the background layers are warped and the objects are pasted back onto the background layers to produce the retargeted results. The core difference in our approach is that for each object, an object handle that represents the connection between the object and its background is defined. During the war** process, we constrain the object handles to be kept as rigid as possible and we ensure that the object is pasted onto the warped background to coincide with the object handle. This technique guarantees the preservation of semantic connectedness.

In the following sub-sections, we provide a detail description of our algorithm in three main steps; (1) pre-processing, (2) war**, and (3) image compositing.

Fig. 1.
figure 1

Overview of our stereo image retargeting approach

3.1 Pre-processing

In the pre-processing phase, given a stereo image pair, we first compute the disparity map using sum of absolute difference (SAD). The disparity values should be preserved in the retargeted image pair for ensuring comfortable 3D viewing experience. Next, we provide a simple, semi-automatic interface, powered by Grabcut [9] to allow users to select objects on the left image and define their respective handles with just a few clicks. The corresponding object segments and object handles in the right image are then automatically inferred from the left image based on disparity map. Exemplar based inpainting [10] is then used to fill up holes in the background layers. The disparity map for a given stereo image pair, the object segment, and the inpainted background layer for the left image are illustrated in the pre-proessing box of Fig. 1. We can observe that although the inpainting result is imperfect, the artifacts are not shown in the retargeted results illustrated in Fig. 1.

3.2 War**

Next, we perform triangle-based war** on the inpainted background layer pair to retarget it to the desired target size. Let L denote the left image and R denote the right image. Given the source left and right triangle meshes, \(M_L\) and \(M_R\) and the respective object handles, the war** process is the problem of map** \(M_L\) and \(M_R\) to their target meshes, \({M}'_L\) and \({M}'_R\). During the war** process, we aim to preserve the stereoscopic properties and keep the object handles as rigid as possible. The war** energy attempts to minimize a set of errors that consists of the scale transformation error, the smoothness error and the stereoscopic quality error, subject to a set of constraints. Figure 2 shows the warped background layers where objects handles kept as rigid as possible.

Fig. 2.
figure 2

Triangular mesh of the (left) inpainted background layer and (right) the warped background layer (width reduced by 20 %). The shape of the object handle (represented by the yellow lines) in the warped image is preserved rigidly (Color figure online).

Scale Transformation Error. Let T be the set of all triangles in the input meshes, \(M_L\cup M_R\). For each triangle \(t\,\varepsilon \,T\), we constrain the transformation to non-uniform scaling [2, 11], denoted by,

$$\begin{aligned} \begin{aligned} G_t=\begin{pmatrix} S_t^x &{} 0 \\ 0 &{} S_t^y \end{pmatrix} \end{aligned} \end{aligned}$$
(1)

where \(S_t^x\) and \(S_t^y\) are the scale of triangle t in the x and y dimensions respectively. The scale transformation error, \(E_w\) is then defined as,

$$\begin{aligned} \begin{aligned} E_{w}=\sum _{t\varepsilon T} A_t \left\| J_t - G_t \right\| _F^2 \end{aligned} \end{aligned}$$
(2)

where \(A_t\) is the area of triangle t, \(\left\| . \right\| _F^2\) is the Frobenius norm, and \(J_t\) is a \(2\times 2\) Jacobian matrix that represents the linear portion of the affine map** that maps a triangle to its corresponding triangle in \(M'_L\cup M'_R\).

Smoothness Error. The smoothness error tries to avoid discontinuity by minimizing the scale difference between neighboring triangles,

$$\begin{aligned} \begin{aligned} E_{s}=\sum _{s,t\varepsilon T} A_{st} \left\| G_t - G_s \right\| _F^2 \end{aligned} \end{aligned}$$
(3)

where \(A_{st}=(A_s+A_t)/2\) and s, t are adjacent triangles.

Stereoscopic Quality Error. To ensure the stereoscopic properties are preserved, we minimize the change in two stereoscopic properties [8]; (1) disparity between input and output stereo mesh pair, and (2) vertical drift between left and right output meshes. Let \({(p_i^L, p_i^R)}\) and \({(q_i^L,q_i^R )}\) denote the sets of corresponding points in the disparity map of the input and output images respectively where \(p_i^L\) and \(p_i^R\) are the corresponding points of the input meshes, \({M}_L\cup {M}_R\) and \(q_i^L\) and \(q_i^R\) are the corresponding points of the output meshes, \({M}'_L\cup {M}'_R\). The stereoscopic quality error is then defined as

$$\begin{aligned} \begin{aligned} E_t=\sum _{(q_i^L, q_i^R)\varepsilon {M}'_L \bigcup {M}'_R}E_d(q_i^L, q_i^R)+E_v(q_i^L, q_i^R) \end{aligned} \end{aligned}$$
(4)

where \(E_d\) indicates the disparity consistency, and \(E_v\) ensures zero vertical drift.

$$\begin{aligned} \begin{aligned} E_d(q_i^L, q_i^R)=\left( (q_i^R(x) -q_i^L(x))-(p_i^R(x) -p_i^L(x))\right) ^2 \end{aligned} \end{aligned}$$
(5)
$$\begin{aligned} \begin{aligned} E_v(q_i^L, q_i^R)=(q_i^R(y) -q_i^L(y))^2 \end{aligned} \end{aligned}$$
(6)

where (x) and (y) refers the x and y coordinate of the particular point.

Fig. 3.
figure 3

Analysis of disparity preservation. (left) Original stereo image, (middle) retargeted results of single image tearable war**, without consideration for stereoscopic properties preservation, and (right) results of our approach that minimizes stereoscopic properties error. Yellow boxes and black dots highlight the comparison of disparity preservation. Green boxes highlight more violation in disparity preservation (Color figure online).

Total War** Error. The total war** energy function E can be formulated as the weighted sum of the scale transformation error, the smoothness error and the stereoscopic quality error,

$$\begin{aligned} \begin{aligned} E=\alpha E_w+\beta E_s+\gamma E_t \end{aligned} \end{aligned}$$
(7)

where \(\alpha \), \(\beta \), and \(\gamma \) are the corresponding weights.

Constraints. To preserve semantics connectedness, we define the handle shape constraint [2] to rigidly preserve the shape and orientation of the object handles. In addition, the image boundary and object boundary constraints were added to ensure that the original image boundary remains on the boundary and user-defined objects do not move out of the image boundary.

3.3 Image Compositing

At this stage, all objects in both images will be scaled to its respective scale factor \(S_k\) and inserted to coincide with its corresponding object handle in the warped background image. The sequence of the insertion of an object to the background image follows the depth order acquired from depth map.

3.4 Optimization Details

We use the CVX Matlab toolbox [12] to find the solution to the convex quadratic function defined in Eq. (7). Weights for the total energy; \(\alpha , \beta \) and \(\gamma \) are set to 1, 0.5 and 0.1 respectively. This set of weights is obtained from our experiments. Notably, the \(\gamma \) value is set to a much lower value, 0.1 compared to \(\alpha \) and \(\beta \), in order to avoid smoothness being penalized. Our experiments show that a higher weight for \(\gamma \) will reduce smoothness and cause obvious discontinuity and severe compression in certain part of the retargeted image. The handle shape and image boundary constraints are set as hard constraints while the object boundary is set as inequality constraint. The scale factor, \(S_k\) for each object is set to 1.

4 Results and Discussion

The images used to test our approach are collected from Flickr [13], and Yury Golubinskys stereo blog [14]. The results were generated on a desktop with Intel i7 CPU 3.40 GHz and 12 GB memory. The computation time depends on the number of triangles in the triangular meshes used to represent the the stereo image pair. Excluding the time taken for inpainting, which is performed at the pre-processing stage, our algorithm produces retargeted results in about 3.5 to 6 s. For example, it takes 5.5 s to produce the result for a 720 \(\times \) 480 stereo image that is represented by 1552 triangles.

Fig. 4.
figure 4

Comparison of our results with stereo crop** [8]. Reducing width by 30 %: (left) input left stereo image, (middle) stereo crop** [8], and (right) our method. Yellow box highlights the important object that is being cropped off (Color figure online).

To illustrate the effectiveness of our approach in preserving the stereoscopic properties, we compare our approach against the naive adoption of single image tearable image war** to the stereoscopic domain. From Fig. 3, it is obvious that the tearable image war** approach designed for single image [2] fails to preserve the disparity consistency. On the contrary, our algorithm successfully minimize the variance of disparity between the original and retargeted stereo images to preserve the stereoscopic properties. Therefore, our results do not cause any visual discomfort to the user during 3D viewing of the retargeted stereo image.

Fig. 5.
figure 5

Extreme Retargeting Results - reducing width by 40 %: (column 1) input left stereo image with object handle in yellow, (column 2) stereo seam carving [4], (column 3) stereo traditional war** [11], and (column 4) our approach. Yellow boxes highlight object distortion (Color figure online).

Fig. 6.
figure 6

Our method can better preserve the geometric structure of the arch shape of the pillars. (left) Left image of our result and (right) left image of result of stereo war** [11].

Fig. 7.
figure 7

Extreme Retargeting Results - reducing height by 40 %: (row 1, column 1,3) input left stereo image with object handle in yellow, (row 1, column 2,4) original stereo image, (row 2) stereo seam carving [4], (row 3) stereo traditional war** [11], and (row 4) our approach. Yellow boxes highlight object distortion (Color figure online).

Next, we perform comparison with state-of-the-art stereo retargeting aprroaches. We compare our results with stereo crop** [3], stereo seam carving [4], stereo traditional war** based on non-homogenous scaling [11], and scene war** [8], where each selected approach corresponds to a category of retargeting approaches; simple operator, discrete, continuous and hybrid approaches respectively. Compared to stereo crop** [3], our approach can better preserve the global image context and reduce loss of important content, as shown in Fig. 4.

Figures 5 and 7 compared our approach with stereo seam carving [4] and stereo traditional war** [11] for cases of extreme retargeting, where width or height is reduced by 40 %. Severe object distortion/compression can be observed for most results of seam carving and traditional war**. On the other hand, our approach achieves zero object distortion as the important objects are not warped during the optimization process. Due to its discrete nature, seam carving approach also fails to prevent distortion to geometric structures. Compared to traditional war** [11], as illustrated in Fig. 6, our approach can better preserve geometric structures because in our tearable war** based approach, only object handles needs to be preserved rigidly. Therefore, the compression that occurs during war** can be distributed more evenly to other parts of the image. On the other hand, in the traditional war** approach, the whole area that contains the object needs to be preserved, leaving less space for compression during war**.

Fig. 8.
figure 8

Comparison of results with scene war** - reducing width by 40 %: (left) input left stereo image, (middle) scene war** [8], and (right) our approach.

Fig. 9.
figure 9

More results - (column 1, row 1,3) Input left stereo image with object handle marked in yellow, (column 1, row 2, 4) original stereo image, (column 2) reducing width by 40 %, (column 3) reducing width by 20 %, and (column 4) increasing width by 20 % (Color figure online).

Lastly, we compare our results with scene war** [8]. Due to adopting the similar layer-based approach, both scene war** and our approach can avoid object distortion and preserve structural details better than traditional war** and seam carving. In addition, as illustrated in Fig. 8, both of these approaches can allow overlap** based on depth order and thus, can support extreme image retargeting. However, our approach should preserve semantic connectedness better than scene war**. Our optimization algorithm constrains the object handle to be as rigid as possible and thus it can guarantee the preservation of the semantic relationship between an object and its background, as shown in the perfect shadow connection in our retargeted results in Fig. 9. We could not show the result of scene war** for comparison on preservation of semantic connectedness due to unavailability of their source code. But, based on theoretical analysis, the shadow may not be well-preserved by scene war** as there is no energy or constraint that guarantees the preservation of semantic connectedness.

Analysis of our results show that our approach is quite robust to background distortion due to the more even distribution of compression throughout the retargeted stereo image. However, in cases of images with very complex geometrical structures, background distortion is still unavoidable. In such cases, additional constraint is needed to preserve the geometrical details. Another potential problem with our approach is the possibility of incorrect propagation of object segments and their respective handles from the left image to the right image of the stereo image pair, due to inaccurate disparity information. To our pleasant surprise, from our experiments, this problem seldom crop up. Finally, our approach requires inpainting, which is still an open problem in computer vision. Although inpainting artifacts are inevitable, in most of our retargeting results, the artifacts are being covered up when objects are pasted onto the warped background during the image compositing stage. For cases where inpainting artificacts are visible, we can allow users to interactively touch up the artifacts.

5 Conclusion

This paper has successfully extended the tearable image war** method [2] to perform stereoscopic image retargeting. This proposed method retargets both the left and right image of the stereo image pair simultaneously using a revised, global optimization algorithm. Experiments show that our approach is able to preserve the stereoscopic properties in the retargeted images and compares favourably with existing methods. Since the war** process does not involve the foreground objects, our method ensures object protection and produces less severe compression compared to stereo seam carving and stereo traditional war** methods, particularly in extreme retargeting cases. The core strength of our method is in its ability to guarantee the preservation of semantic connectedness between an object and its background without distorting the objects, while preserving the stereoscopic properties of the stereo image.