Privacy-Preserving Split Learning via Pareto Optimal Search

  • Conference paper
  • First Online:
Computer Security – ESORICS 2023 (ESORICS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14347))

Included in the following conference series:

  • 428 Accesses

Abstract

With the rapid development of deep learning, it has become a trend for clients to perform split learning with an untrusted cloud server. The models are split into the client-end and server-end with features transmitted in between. However, features are typically vulnerable to attribute inference attacks to the input data. Most existing schemes target protecting data privacy at the inference, but not at the training stage. It remains a significant challenge to remove private information from the features while accomplishing the learning task with high utility.

We found the fundamental issue is that utility and privacy are mostly conflicting tasks, which are hardly handled by the linear scalarization commonly used in previous works. Thus we resort to the multi-objective optimization (MOO) paradigm, seeking a Pareto optimal solution according to the utility and privacy objectives. The privacy objective is formulated by the mutual information between feature and sensitive attributes and is approximated by Gaussian models. In each training iteration, we select a direction that balances the dual goal of moving toward the Pareto Front and toward the users’ preference while kee** the privacy loss under the preset threshold. With a theoretical guarantee, the privacy of sensitive attributes is well preserved throughout training and at convergence. Experimental results on image and tabular datasets reveal our method is superior to the state-of-the-art in terms of utility and privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Doersch, C.: Tutorial on variational autoencoders. ar**v preprint ar**v:1606.05908 (2016)

  2. Duong, T., Hazelton, M.L.: Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J. Multivar. Anal. 93(2), 417–433 (2005)

    Article  MathSciNet  Google Scholar 

  3. Gupta, O., Raskar, R.: Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 116, 1–8 (2018)

    Article  Google Scholar 

  4. Hershey, J.R., Olsen, P.A.: Approximating the Kullback Leibler divergence between gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 4, pp. IV–317. IEEE (2007)

    Google Scholar 

  5. Hillermeier, C.: Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach, vol. 135. Springer Science & Business Media, Cham (2001)

    Book  Google Scholar 

  6. Jia, J., Gong, N.Z.: \(\{\)AttriGuard\(\}\): a practical defense against attribute inference attacks via adversarial machine learning. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 513–529 (2018)

    Google Scholar 

  7. Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. 110(15), 5802–5805 (2013)

    Article  Google Scholar 

  8. Li, A., Duan, Y., Yang, H., Chen, Y., Yang, J.: TIPRDC: task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 824–832 (2020)

    Google Scholar 

  9. Liu, S., Du, J., Shrivastava, A., Zhong, L.: Privacy adversarial network: representation learning for mobile data privacy. Proc. ACM Interact. Mob. Wear. Ubiquit. Technol. 3(4), 1–18 (2019)

    Google Scholar 

  10. Mahapatra, D., Rajan, V.: Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In: International Conference on Machine Learning, pp. 6597–6607. PMLR (2020)

    Google Scholar 

  11. Mahapatra, D., Rajan, V.: Exact pareto optimal search for multi-task learning: touring the pareto front. ar**v preprint ar**v:2108.00597 (2021)

  12. Moyer, D., Gao, S., Brekelmans, R., Galstyan, A., Ver Steeg, G.: Invariant representations without adversarial training. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  13. Osia, S.A., Taheri, A., Shamsabadi, A.S., Katevas, K., Haddadi, H., Rabiee, H.R.: Deep private-feature extraction. IEEE Trans. Knowl. Data Eng. 32(1), 54–66 (2018)

    Article  Google Scholar 

  14. Pasquini, D., Ateniese, G., Bernaschi, M.: Unleashing the tiger: inference attacks on split learning. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 2113–2129 (2021)

    Google Scholar 

  15. Pittaluga, F., Koppal, S., Chakrabarti, A.: Learning privacy preserving encodings through adversarial training. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 791–799. IEEE (2019)

    Google Scholar 

  16. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  17. Salamatian, S., et al.: Managing your private and public data: Bringing down inference attacks against your privacy. IEEE J. Select. Top. Signal Process. 9(7), 1240–1255 (2015)

    Article  Google Scholar 

  18. Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  19. Song, C., Shmatikov, V.: Overlearning reveals sensitive attributes. In: 8th International Conference on Learning Representations, ICLR 2020 (2020)

    Google Scholar 

  20. Weinsberg, U., Bhagat, S., Ioannidis, S., Taft, N.: Blurme: inferring and obfuscating user gender based on ratings. In: Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 195–202 (2012)

    Google Scholar 

  21. D **aochen, Z.: Feature inference attacks on split learning with an honest-but-curious server, Ph.D. thesis, National University of Singapore (2022)

    Google Scholar 

  22. **e, Q., Dai, Z., Du, Y., Hovy, E., Neubig, G.: Controllable invariance through adversarial feature learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  23. Yang, T.Y., Brinton, C., Mittal, P., Chiang, M., Lan, A.: Learning informative and private representations via generative adversarial networks. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 1534–1543. IEEE (2018)

    Google Scholar 

  24. Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pp. 160–164. IEEE (1982)

    Google Scholar 

  25. Zheng, T., Li, B.: Infocensor: an information-theoretic framework against sensitive attribute inference and demographic disparity. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pp. 437–451 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSF China (62272306, 62136006, 62032020), and a specialized technology project for the pre-research of generic information system equipment (31511130302).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liyao **ang .

Editor information

Editors and Affiliations

Appendices

A Notations

We list all notations in this paper in Table 6 for better understanding.

Table 6. Notations used in this paper

B Proof for Loss Formulation

Proof

\(\textbf{Lower} \, \textbf{bound} \, \textbf{of} \, \textbf{I}(\textbf{z};\textbf{y}).\) According to [13], for any conditional distribution q(y|z), it holds that

$$\begin{aligned} I(z;y) \ge H(y)+\mathbb {E}_{z,y}\log q(y|z). \end{aligned}$$
(13)

Hence the lower bound of I(zy) can be defined as

$$\begin{aligned} \mathcal {L}=H(y)+\max _{\phi } \mathbb {E}_{z,y} \log q_{\phi }(y|z). \end{aligned}$$
(14)

The model with parameter \(\phi \) is exactly the classifier performing the original task using uploaded feature z. For fixed z, the better the classifier is trained, the better estimation result it will achieve. Since H(y) is a constant, we only need to optimize the second term of \(\mathcal {L}\), which can be translated into minimizing the cross entropy loss. Denoting the cross entropy loss as \(CE(\cdot )\), the utility loss can be defined as:

$$ L_u = CE(g(f(x)),y). $$

\(\textbf{Upper} \, \textbf{Bound} \, \textbf{of} \, \textbf{I}(\textbf{z};\textbf{s}).\) Assuming s is a discrete variable, the mutual information can be written as

$$\begin{aligned} I(z;s)=\sum _{a}p(s_a) \int p(z|s_a)\log \frac{p(z|s_a)}{\sum _{b}p(s_b)p(z|s_b)} \text {d} z. \end{aligned}$$
(15)

By Jensen’s Inequality, I(zs) is upper bounded by:

$$\begin{aligned} \mathcal {U} =\sum _{a}\sum _{b \ne a}p(s_a)p(s_b)KL[p(z|s_a)||p(z|s_b)]. \end{aligned}$$
(16)

C Proof of Lemma 2

Proof

We prove by contradiction. The formula in (10) can be written as

$$\begin{aligned} ||C\boldsymbol{\beta } - \boldsymbol{a}||_2^2 = ||G^T \boldsymbol{d}||_2^2 + ||\boldsymbol{a}||_2^2 - 2\boldsymbol{a}^T G^T \boldsymbol{d}. \end{aligned}$$
(17)

If \(\boldsymbol{a}^T G^T \boldsymbol{d} < 0\) at convergence, then \(||C\boldsymbol{\beta } - \boldsymbol{a}||_2^2 > ||\boldsymbol{a}||_2^2\), which is greater than the result at \(\boldsymbol{d} = \boldsymbol{0}\). Hence it contradicts with the fact that the current solution is optimal. Thus \(\boldsymbol{a}^T G^T \boldsymbol{d} \ge 0\) holds by solving (10).

D Proof of Lemma 3

Proof

When rank(C) = rank(G) = m, we can always find \(\boldsymbol{\beta }^{*}\) such that \(C\boldsymbol{\beta }^{*} = \boldsymbol{l}\). Due to the range constraint of \(\boldsymbol{\beta }\), we need to scale \(\boldsymbol{\beta }^{*}\) to the range of \([-1,1]^m\) without altering the direction of \(C\boldsymbol{\beta }^{*}\). Hence we could get \(G^T \boldsymbol{d} = C \boldsymbol{\beta } = k \boldsymbol{l} \succ \boldsymbol{0} (k>0)\), and \(\boldsymbol{a}^T G^T \boldsymbol{d} = k \boldsymbol{a}^T \boldsymbol{l} = 0\). The last equality holds due to our claim that \(\boldsymbol{a}^T \boldsymbol{l} = 0\).

E Proof of Theorem 1

Proof

If \(\theta \) is a Pareto optimal solution in optimizing (12), by Pareto Criticality ([5], ch4), G has rank \(m-1\). Hence rank(C) = rank(G) = \(m-1\). Note that \(\boldsymbol{a} \ne \boldsymbol{0}\) at this point, otherwise the optimization terminates for \(\boldsymbol{d} = \boldsymbol{0}\).

Let \( \mathcal {S} = \{ C \boldsymbol{\beta } | \boldsymbol{\beta } \in [-1,1]^m\}\), Col(C) be the column space of C, and Null(C) be the null space of C. It is clear that \(\mathcal {S} \subseteq Col(C)\), and Col(C) and Null(C) are orthogonal complement spaces. By minimizing the objective of (12), \(C \boldsymbol{\beta }\) turns out to be the best approximation of \(\boldsymbol{a}\) on \(\mathcal {S}\). Now we prove by contradiction. Assuming \(\boldsymbol{d} = \boldsymbol{0}\), we have \(C \boldsymbol{\beta } = G^{T} \boldsymbol{d} = \boldsymbol{0}\). Thus the orthogonal projection of \(\boldsymbol{a}\) on \(\mathcal {S}\) is \(\boldsymbol{0}\). Hence the orthogonal projection of \(\boldsymbol{a}\) on Col(C) is \(\boldsymbol{0}\), which means \(\boldsymbol{a} \in Null(C)\). Due to the orthogonality of \(\boldsymbol{a}\) and \(\boldsymbol{l}\), \(\boldsymbol{l} \in Col(C)\), which means \(\exists \boldsymbol{\alpha } \ \mathrm {s.t.} \ C \boldsymbol{\alpha } = G^T G \boldsymbol{\alpha } = \boldsymbol{l} \succeq \boldsymbol{0}\). So we can find a direction \(\boldsymbol{d} = G \boldsymbol{\alpha }\) to decrease all losses, which is contradictory to the condition of Pareto optimality. Therefore, \(\boldsymbol{d}\) is not \(\boldsymbol{0}\).

According to Lemma 1, Lemma 2 and Lemma 3, the direction \(\boldsymbol{d}\) found by (12) always satisfies \(\boldsymbol{a}^T G^T \boldsymbol{d} \ge 0\), so with proper step size, \(\mu \) keeps decreasing.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, X., **ang, L., Wang, S., Long, C. (2024). Privacy-Preserving Split Learning via Pareto Optimal Search. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51482-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51481-4

  • Online ISBN: 978-3-031-51482-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation