Semantics-Preserving War** for Stereoscopic Image Retargeting

Tan, Chun-Hau; Islam, Md Baharul; Wong, Lai-Kuan; Low, Kok-Lim

doi:10.1007/978-3-319-29451-3_21

Chun-Hau Tan¹⁷,
Md Baharul Islam¹⁷,
Lai-Kuan Wong¹⁷ &
…
Kok-Lim Low¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9431))

Included in the following conference series:

Image and Video Technology

2592 Accesses

Abstract

Due to availability and popularity of stereoscopic displays in the recent years, research into stereo image retargeting is receiving considerable attention. In this paper, we extend the tearable image war** method for stereo image retargeting. Our method retargets both the left and right image of the stereo image pair simultaneously to preserve scene consistency, and minimize distortion using a global optimization algorithm. It is also able to preserve stereoscopic properties of the resulting stereo image. Experimental results show that our approach can preserve the global image context better than stereoscopic crop**, preserve structural details better than stereoscopic seam carving, and protect objects better than stereoscopic traditional war**. Besides, compared to scene war**, our approach can guarantee semantic connectedness.

You have full access to this open access chapter, Download conference paper PDF

Disparity-Constrained Resolution Adaptation for Stereoscopic Images

Simultaneously retargeting and super-resolution for stereoscopic video

Article 01 August 2016

Multi-operator image retargeting with visual quality preservation of salient regions

Article 17 November 2022

Keywords

1 Introduction

With the recent availability of stereoscopic displays such as 3D monitor, 3D television and stereo camera phone, there is an increasing need for stereo image retargeting techniques. Stereo image retargeting aims to resize a stereo image pair to fit various stereoscopic displays of different aspect ratios and sizes. An ideal stereo image retargeting method should be able to (1) ensure scene consistency, (2) preserve geometric/structural details, and (3) preserve stereoscopic properties between the left and right image. More specifically, scene consistency properties [1, 2] include zero object distortion, correct scene occlusion and correct semantic connectedness (consistent physical contacts between objects with their environment in the retargeted image).

Retargeting using simple operators such as scaling and crop** [3] do not guarantee protection of important objects, leading to distortion of objects or loss of important content. Therefore, content-aware stereo retargeting approaches, which can be categorized into discrete, continuous and hybrid approaches, are gaining more research attention. Discrete approaches [4, 5] can produce impressive results for stereo images with simple background but due to its discontinuous nature, object distortion and artifacts are unavoidable for images with complex background. Continuous approaches [6, 7] utilize war** to retarget stereoscopic images. Due to their continuous nature, these approaches can preserve semantic connectedness well but distortion to objects and structural details are unavoidable in extreme retargeting cases. Lee et al. [8] propose a hybrid retargeting approach that segments out the objects into object layers, performs war** on the background layer and paste the objects back to the warped background to produce the retargeted image. This approach ensures objects protection and better preserves structural details but does not guarantee semantic connectedness between a segmented object and its background.

To better preserve scene consistency while minimizing artifacts and preserving stereoscopic properties, we propose a novel stereo image retargeting technique based on the tearable image war** technique [2]. Conceptually, in tearable image war**, an object boundary is divided into tearable and non-tearable segments. Tearable segments correspond to where depth discontinuity occurs, and these segments are allowed to break away from its environment, thus allowing the background war** to be distributed more evenly, leading to reduced distortion of structural details. Non-tearable segments correspond to the object boundary that has actual physical contacts with the environment or other objects. These segments help to preserve semantic connectedness by constraining the object to maintain the real contacts in the 3D world. This approach is able to preserve semantic connectedness very well. However, it is designed for single image and thus, unable to preserve the stereoscopic properties of a retargeted stereo image.

The main contribution of this work is to extend the tearable image war** method [2] for stereo image retargeting. The proposed method retargets both the left and right image of the stereo image pair simultaneously to preserve scene consistency, and minimize distortion, through a revised-optimization algorithm. In addition, it successfully maintains consistent stereoscopic properties of the resulting stereo image pair to avoid visual discomfort during viewing. Besides being able to maintain stereoscopic consistency and guarantees semantic connectedness, experimental results show that our approach can preserve structural details better than stereoscopic seam carving, preserve the global image context better than stereo crop** and protect objects better than traditional stereoscopic war**.

2 Related Works

State-of-the-art content-aware, stereo image retargeting can be categorized into discrete, continuous and hybrid approaches. Discrete methods include scene carving [1] and patch-based [5] approaches. As with its 2D counterpart, the geometrically consistent seam carving [4] approach for stereo image retargeting produces good results for non-complex images but fails in complex images and extreme retargeting cases. The shift-map [5] based approach characterizes geometric rearrangement of a stereo image pair. It can preserve the foreground objects well but does not consider the preservation of semantics features like ripples, shadow, symmetry, texture, etc.

Continuous approaches generally utilize war** to retarget a stereo image pair. Niu et al. [7] extend traditional image war** to stereo images with the objective to preserve prominent objects and 3D structure. Chang et al. [6] war**-based approach aims to avoid diplopia (double vision) by optimizing two stereoscopic constraints; vertical alignment for avoiding vertical artifact and horizontal disparity consistency. However, these war**-based approaches cannot avoid object distortion in cases of extreme image retargeting. In addition, the stereoscopic quality of the retargeted images could be reduced because the whole image is warped in a continuous manner, thus disallowing proper occlusions and depth discontinuity to be created.

To address the limitations of the above war** based approaches, Lee et al. [8] proposes scene war**, a hybrid approach that decomposes the input stereo image into several layers according to the depth order and each layer is warped according to its own mesh deformation. The warped layers were then composited together according to depth order to get the retargeted image. This method produces better stereoscopic quality and ensures object protection, but it cannot guarantee the preservation of the semantic connectedness between a foreground object and its background environment, such as shadow and ripples.

3 Our Approach

Our approach extends the tearable warp algorithm [2] for single image retargeting to the stereoscopic domain. The challenge lies in maintaining the stereoscopic properties of the retargeted stereo image pair and ensuring that the object extraction process for both the left and right image is trivial and coherent. Figure 1 illustrates an overview of our approach. Similar to scene war** approach [8], in our tearable warp based approach, an input stereo image pair is first separated into the background and object layers. Then, the background layers are warped and the objects are pasted back onto the background layers to produce the retargeted results. The core difference in our approach is that for each object, an object handle that represents the connection between the object and its background is defined. During the war** process, we constrain the object handles to be kept as rigid as possible and we ensure that the object is pasted onto the warped background to coincide with the object handle. This technique guarantees the preservation of semantic connectedness.

In the following sub-sections, we provide a detail description of our algorithm in three main steps; (1) pre-processing, (2) war**, and (3) image compositing.

3.1 Pre-processing

In the pre-processing phase, given a stereo image pair, we first compute the disparity map using sum of absolute difference (SAD). The disparity values should be preserved in the retargeted image pair for ensuring comfortable 3D viewing experience. Next, we provide a simple, semi-automatic interface, powered by Grabcut [9] to allow users to select objects on the left image and define their respective handles with just a few clicks. The corresponding object segments and object handles in the right image are then automatically inferred from the left image based on disparity map. Exemplar based inpainting [10] is then used to fill up holes in the background layers. The disparity map for a given stereo image pair, the object segment, and the inpainted background layer for the left image are illustrated in the pre-proessing box of Fig. 1. We can observe that although the inpainting result is imperfect, the artifacts are not shown in the retargeted results illustrated in Fig. 1.

3.2 War**

Next, we perform triangle-based war** on the inpainted background layer pair to retarget it to the desired target size. Let L denote the left image and R denote the right image. Given the source left and right triangle meshes, $M_L$ and $M_R$ and the respective object handles, the war** process is the problem of map** $M_L$ and $M_R$ to their target meshes, ${M}'_L$ and ${M}'_R$. During the war** process, we aim to preserve the stereoscopic properties and keep the object handles as rigid as possible. The war** energy attempts to minimize a set of errors that consists of the scale transformation error, the smoothness error and the stereoscopic quality error, subject to a set of constraints. Figure 2 shows the warped background layers where objects handles kept as rigid as possible.

Scale Transformation Error. Let T be the set of all triangles in the input meshes, $M_L\cup M_R$. For each triangle $t\,\varepsilon \,T$, we constrain the transformation to non-uniform scaling [2, 11], denoted by,

$$\begin{aligned} \begin{aligned} G_t=\begin{pmatrix} S_t^x &{} 0 \\ 0 &{} S_t^y \end{pmatrix} \end{aligned} \end{aligned}$$

(1)

where $S_t^x$ and $S_t^y$ are the scale of triangle t in the x and y dimensions respectively. The scale transformation error, $E_w$ is then defined as,

$$\begin{aligned} \begin{aligned} E_{w}=\sum _{t\varepsilon T} A_t \left\| J_t - G_t \right\| _F^2 \end{aligned} \end{aligned}$$

(2)

where $A_t$ is the area of triangle t, $\left\| . \right\| _F^2$ is the Frobenius norm, and $J_t$ is a $2\times 2$ Jacobian matrix that represents the linear portion of the affine map** that maps a triangle to its corresponding triangle in $M'_L\cup M'_R$.

Smoothness Error. The smoothness error tries to avoid discontinuity by minimizing the scale difference between neighboring triangles,

$$\begin{aligned} \begin{aligned} E_{s}=\sum _{s,t\varepsilon T} A_{st} \left\| G_t - G_s \right\| _F^2 \end{aligned} \end{aligned}$$

(3)

where $A_{st}=(A_s+A_t)/2$ and s, t are adjacent triangles.

Stereoscopic Quality Error. To ensure the stereoscopic properties are preserved, we minimize the change in two stereoscopic properties [8]; (1) disparity between input and output stereo mesh pair, and (2) vertical drift between left and right output meshes. Let ${(p_i^L, p_i^R)}$ and ${(q_i^L,q_i^R )}$ denote the sets of corresponding points in the disparity map of the input and output images respectively where $p_i^L$ and $p_i^R$ are the corresponding points of the input meshes, ${M}_L\cup {M}_R$ and $q_i^L$ and $q_i^R$ are the corresponding points of the output meshes, ${M}'_L\cup {M}'_R$. The stereoscopic quality error is then defined as

$$\begin{aligned} \begin{aligned} E_t=\sum _{(q_i^L, q_i^R)\varepsilon {M}'_L \bigcup {M}'_R}E_d(q_i^L, q_i^R)+E_v(q_i^L, q_i^R) \end{aligned} \end{aligned}$$

(4)

where $E_d$ indicates the disparity consistency, and $E_v$ ensures zero vertical drift.

$$\begin{aligned} \begin{aligned} E_d(q_i^L, q_i^R)=\left( (q_i^R(x) -q_i^L(x))-(p_i^R(x) -p_i^L(x))\right) ^2 \end{aligned} \end{aligned}$$

(5)

$$\begin{aligned} \begin{aligned} E_v(q_i^L, q_i^R)=(q_i^R(y) -q_i^L(y))^2 \end{aligned} \end{aligned}$$

(6)

where (x) and (y) refers the x and y coordinate of the particular point.

Total War** Error. The total war** energy function E can be formulated as the weighted sum of the scale transformation error, the smoothness error and the stereoscopic quality error,

$$\begin{aligned} \begin{aligned} E=\alpha E_w+\beta E_s+\gamma E_t \end{aligned} \end{aligned}$$

(7)

where $\alpha $, $\beta $, and $\gamma $ are the corresponding weights.

Constraints. To preserve semantics connectedness, we define the handle shape constraint [2] to rigidly preserve the shape and orientation of the object handles. In addition, the image boundary and object boundary constraints were added to ensure that the original image boundary remains on the boundary and user-defined objects do not move out of the image boundary.

3.3 Image Compositing

At this stage, all objects in both images will be scaled to its respective scale factor $S_k$ and inserted to coincide with its corresponding object handle in the warped background image. The sequence of the insertion of an object to the background image follows the depth order acquired from depth map.

3.4 Optimization Details

We use the CVX Matlab toolbox [12] to find the solution to the convex quadratic function defined in Eq. (7). Weights for the total energy; $\alpha , \beta $ and $\gamma $ are set to 1, 0.5 and 0.1 respectively. This set of weights is obtained from our experiments. Notably, the $\gamma $ value is set to a much lower value, 0.1 compared to $\alpha $ and $\beta $, in order to avoid smoothness being penalized. Our experiments show that a higher weight for $\gamma $ will reduce smoothness and cause obvious discontinuity and severe compression in certain part of the retargeted image. The handle shape and image boundary constraints are set as hard constraints while the object boundary is set as inequality constraint. The scale factor, $S_k$ for each object is set to 1.

4 Results and Discussion

The images used to test our approach are collected from Flickr [13], and Yury Golubinskys stereo blog [14]. The results were generated on a desktop with Intel i7 CPU 3.40 GHz and 12 GB memory. The computation time depends on the number of triangles in the triangular meshes used to represent the the stereo image pair. Excluding the time taken for inpainting, which is performed at the pre-processing stage, our algorithm produces retargeted results in about 3.5 to 6 s. For example, it takes 5.5 s to produce the result for a 720 $\times $ 480 stereo image that is represented by 1552 triangles.

To illustrate the effectiveness of our approach in preserving the stereoscopic properties, we compare our approach against the naive adoption of single image tearable image war** to the stereoscopic domain. From Fig. 3, it is obvious that the tearable image war** approach designed for single image [2] fails to preserve the disparity consistency. On the contrary, our algorithm successfully minimize the variance of disparity between the original and retargeted stereo images to preserve the stereoscopic properties. Therefore, our results do not cause any visual discomfort to the user during 3D viewing of the retargeted stereo image.

Next, we perform comparison with state-of-the-art stereo retargeting aprroaches. We compare our results with stereo crop** [3], stereo seam carving [4], stereo traditional war** based on non-homogenous scaling [11], and scene war** [8], where each selected approach corresponds to a category of retargeting approaches; simple operator, discrete, continuous and hybrid approaches respectively. Compared to stereo crop** [3], our approach can better preserve the global image context and reduce loss of important content, as shown in Fig. 4.

Figures 5 and 7 compared our approach with stereo seam carving [4] and stereo traditional war** [11] for cases of extreme retargeting, where width or height is reduced by 40 %. Severe object distortion/compression can be observed for most results of seam carving and traditional war**. On the other hand, our approach achieves zero object distortion as the important objects are not warped during the optimization process. Due to its discrete nature, seam carving approach also fails to prevent distortion to geometric structures. Compared to traditional war** [11], as illustrated in Fig. 6, our approach can better preserve geometric structures because in our tearable war** based approach, only object handles needs to be preserved rigidly. Therefore, the compression that occurs during war** can be distributed more evenly to other parts of the image. On the other hand, in the traditional war** approach, the whole area that contains the object needs to be preserved, leaving less space for compression during war**.

Lastly, we compare our results with scene war** [8]. Due to adopting the similar layer-based approach, both scene war** and our approach can avoid object distortion and preserve structural details better than traditional war** and seam carving. In addition, as illustrated in Fig. 8, both of these approaches can allow overlap** based on depth order and thus, can support extreme image retargeting. However, our approach should preserve semantic connectedness better than scene war**. Our optimization algorithm constrains the object handle to be as rigid as possible and thus it can guarantee the preservation of the semantic relationship between an object and its background, as shown in the perfect shadow connection in our retargeted results in Fig. 9. We could not show the result of scene war** for comparison on preservation of semantic connectedness due to unavailability of their source code. But, based on theoretical analysis, the shadow may not be well-preserved by scene war** as there is no energy or constraint that guarantees the preservation of semantic connectedness.

Analysis of our results show that our approach is quite robust to background distortion due to the more even distribution of compression throughout the retargeted stereo image. However, in cases of images with very complex geometrical structures, background distortion is still unavoidable. In such cases, additional constraint is needed to preserve the geometrical details. Another potential problem with our approach is the possibility of incorrect propagation of object segments and their respective handles from the left image to the right image of the stereo image pair, due to inaccurate disparity information. To our pleasant surprise, from our experiments, this problem seldom crop up. Finally, our approach requires inpainting, which is still an open problem in computer vision. Although inpainting artifacts are inevitable, in most of our retargeting results, the artifacts are being covered up when objects are pasted onto the warped background during the image compositing stage. For cases where inpainting artificacts are visible, we can allow users to interactively touch up the artifacts.

5 Conclusion

This paper has successfully extended the tearable image war** method [2] to perform stereoscopic image retargeting. This proposed method retargets both the left and right image of the stereo image pair simultaneously using a revised, global optimization algorithm. Experiments show that our approach is able to preserve the stereoscopic properties in the retargeted images and compares favourably with existing methods. Since the war** process does not involve the foreground objects, our method ensures object protection and produces less severe compression compared to stereo seam carving and stereo traditional war** methods, particularly in extreme retargeting cases. The core strength of our method is in its ability to guarantee the preservation of semantic connectedness between an object and its background without distorting the objects, while preserving the stereoscopic properties of the stereo image.

References

Mansfield, A., Gehler, P., Van-Gool, L., Rother, C.: Scene carving: scene consistent image retargeting. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Wong, L.K., Low, K.L.: Tearable image war** for extreme image retargeting. In: Computer Graphics International Conference, pp. 1–8, Bournemouth, UK (2012)
Google Scholar
Niu, Y., Liu, F., Feng, W.C., **, H.: Aesthetics-based stereoscopic photo crop** for heterogeneous displays. IEEE Trans. Multimedia 14(3), 783–796 (2012)
Article Google Scholar
Basha, T., Moses, Y., Avidan, S.: Geometrically consistent stereo seam carving. In: ICCV Conference, pp. 1816–1823, Barcelona, Spain (2011)
Google Scholar
Qi, S., Ho, J.: Shift-map based stereo image retargeting. In: ACCV Conference, pp. 457–469, Daejeon, Korea (2012)
Google Scholar
Chang, C.H., Liang, C.K., Chuang, Y.Y.: Content-aware display adaptation and interactive editing for stereoscopic images. IEEE Trans. Multimedia 13(4), 589–601 (2011)
Article Google Scholar
Niu, Y., Feng, W.C., Liu, F.: Enabling war** on stereoscopic images. ACM Trans. Graph. 31(6), Article No. 183 (2012)
Google Scholar
Lee, K.Y., Chung, C.D., Chuang, Y.Y.: Scene war**: layer-based stereoscopic image resizing. In: CVPR Conference, pp. 49–56, Rhode Island, USA (2012)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: GrabCut - interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Criminisi, A., Prez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004)
Article Google Scholar
**, Y., Liu, L., Wu, Q.: Nonhomogeneous scaling optimization for real time image resizing. Vis. Comput. 26(6), 769–778 (2010)
Article Google Scholar
CVX: Matlab software for disciplined convex programming. http://cvxr.com/cvx/
Flickr. http://www.flickr.com/
Yury Golubinsky’s blog. http://www.urixblog.com/en/about-en/

Download references

Acknowledgement

This work is supported by FRGS Research Grant No. EP20130326018 and MMU Internal Grant No. IP20131108001.

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Chun-Hau Tan, Md Baharul Islam & Lai-Kuan Wong
School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 117417, Singapore
Kok-Lim Low

Authors

Chun-Hau Tan
View author publications
You can also search for this author in PubMed Google Scholar
Md Baharul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Lai-Kuan Wong
View author publications
You can also search for this author in PubMed Google Scholar
Kok-Lim Low
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lai-Kuan Wong .

Editor information

Editors and Affiliations

The University of Western Australia, Crawley, Perth, West Australia, Australia
Thomas Bräunl
University of Otago, Dunedin, New Zealand
Brendan McCane
en Matematicas A.C., Centro de Investigación, Guanajuato, Mexico
Mariano Rivera
Central China Normal University, Wuhan, Hubei, China
**%20for%20Stereoscopic%20Image%20Retargeting&author=Chun-Hau%20Tan%2C%20Md%20Baharul%20Islam%2C%20Lai-Kuan%20Wong%20et%20al&contentID=10.1007%2F978-3-319-29451-3_21&copyright=Springer%20International%20Publishing%20Switzerland&publication=eBook&publicationDate=2016&startPage=257&endPage=268&imprint=Springer%20International%20Publishing%20Switzerland">Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, CH., Islam, M.B., Wong, LK., Low, KL. (2016). Semantics-Preserving War** for Stereoscopic Image Retargeting. In: Bräunl, T., McCane, B., Rivera, M., Yu, X. (eds) Image and Video Technology. PSIVT 2015. Lecture Notes in Computer Science(), vol 9431. Springer, Cham. https://doi.org/10.1007/978-3-319-29451-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-29451-3_21
Published: 04 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29450-6
Online ISBN: 978-3-319-29451-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)