Introduction

Most people dream of owning an aesthetically pleasing home, and living in such a home can make the occupants feel cheerful1. Creating an aesthetically pleasing home usually requires the help of a professional designer, who must use their own aesthetic and professional skills to complete the design for the client. However, designers face two significant problems when designing interiors. On the one hand, designers must create interior designs with different decoration styles for customers to choose from. The huge workload and continuous design revisions lead to low efficiency2,3,4,5,6. On the other hand, it is a significant challenge for designers to create aesthetically pleasing designs in a limited time3,7. Therefore, the key is to solve the problem of low efficiency and a lack of aesthetic appeal in interior design.

The diffusion model has developed rapidly in recent years8,9,10,11,12. Due to its excellent image-generation ability, it has become the mainstream generation model8,9,10. The diffusion model completes model training by learning a large amount of pairing information between text descriptions and images12,13,14,15. Diffusion models can batch-generate high-quality and diverse images from input text descriptions10,16,17,18,19.

While diffusion models perform well in most domains, there is still room for improvement in the technically demanding field of interior design. Two areas need improvement. On the one hand, the traditional diffusion model does not consider aesthetic factors1,20, resulting in most of the interior designs generated by the diffusion model lacking aesthetic appeal. On the other hand, traditional diffusion models use big data collected from the internet for model training, but most of the data lack professional annotations21,22,23. For example, there is a lack of accurate annotation of interior decoration style and spatial function in big data, which leads to confusion in the interior design decoration style and spatial functions generated by diffusion models trained with these data. Therefore, it is necessary to improve the diffusion model in the design field, and it is essential to add aesthetics, decoration style, and spatial function control to the diffusion model.

This research enhances the traditional diffusion model by introducing a fresh and comprehensively annotated indoor dataset with aesthetic scores, decoration styles, and space functions. It further innovates by proposing a unique compound loss function, supplementing the model with aesthetics, decoration styles, and spatial functions while retraining it. This improvement enables the enhanced model to generate interior designs that are aesthetically pleasing and specify the decoration style and function of the space. The upgraded diffusion model can effectively address the prevalent issues of insufficient aesthetically pleasing designs and low productivity in interior design. Figure 1 shows a comparison between our technique for generating interior design effects and the present mainstream diffusion models, including Dall\(\cdot \)E 224, Midjourney25, and Stable Diffusion26.

Figure 1
figure 1

Comparison between mainstream diffusion models and our method for generative living-room design. The images generated by Dall\(\cdot \)E 2 (left) exhibit incorrect spatial sizes, furniture placement, and decoration styles. Midjourney (second from left) produces erroneous lighting fixtures and unrealistic images. Stable Diffusion (third from left) needs to solve the problem of decoration style and furniture generation. None of these images meet the requirements for interior design. In contrast, our proposed method (far right) addresses these issues. (Prompt word: “Realistic, Chinese-style living room with a touch of modernity, featuring a sofa and a table”).

Further explanation of our method. We first collected a dataset called the aesthetic decoration style and spatial function interior dataset (i.e., ADSSFID-49) to address the problem of a lack of training data. This dataset automatically annotates the aesthetic scores of each image using an aesthetic evaluation model and manually annotates the interior decoration style and spatial function of each image. Then, we proposed a compound loss function that comprehensively considers aesthetics, decoration style, and spatial function. This function enables the diffusion model to learn the decoration style and spatial function information of the interior design during the training process and ensures that the generated design is aesthetically pleasing. We trained the model using fine-tuning, requiring fewer data than retraining the entire model, significantly reducing training time and cost. The trained model is called the aesthetic interior design diffusion model (i.e., AIDDM). The AIDDM can automatically generate batches of interior designs with aesthetics, correct decoration styles, and spatial functions for designers to choose from. The AIDDM increases interior design efficiency, reduces the difficulty of achieving aesthetically pleasing designs, and revolutionizes the design workflow. The framework of this research is shown in Fig. 2.

Figure 2
figure 2

Research framework. We first collected over 20,000 indoor design images and annotated them with aesthetic scores, decoration styles, and spatial functionality to create the ADSSFID-49 dataset. We then fine-tuned a diffusion model to generate aesthetically pleasing interior designs. Designers can input design requirements, including decoration styles and functional needs, in the form of text into the model to obtain their desired designs.

The AIDDM model proposed in this research can generate aesthetically pleasing interior designs with specified decoration styles and spatial functions. Figure 3 demonstrates the effects of generating interior designs with different decoration styles and spatial functions.

The main contributions of this research are as follows:

  1. 1.

    Proposing an integrated diffusion model with aesthetic control can generate aesthetically pleasing designs.

  2. 2.

    Proposing an innovative workflow for generating interior designs based on text.

  3. 3.

    Proposing a new composite loss function that improves the generation effectiveness of the diffusion model.

  4. 4.

    Creating a new interior dataset with aesthetic scores, decoration styles, and spatial functions.

  5. 5.

    Demonstrating the advantages of our method in generating interior designs by comparing it with other popular diffusion models.

Figure 3
figure 3

Our diffusion model generates designs for different decorative styles and spatial functions. The x-axis represents seven distinct decoration styles, while the y-axis represents seven different spatial functionalities. The combination of these variables allows for the creation of 49 common types of interior designs. Our method incorporates aesthetic considerations to ensure that the generated images have a certain level of aesthetic appeal.

Literature review

Challenges of traditional interior design

Interior design refers to the creation of spaces within a building by designers3,27. It is a challenging task. Designers must consider regulations, spatial functionality planning, color schemes, and material selection to shape the decoration style6,7. In addition to ensuring that the spatial layout and decoration style are satisfactory to clients, designers need to ensure that the design is aesthetically pleasing. An aesthetically pleasing design brings emotional delight3.

A key challenge in interior design lies in the inefficiency of the process. The traditional interior design workflow is highly complex, involving multiple steps such as requirements analysis, client communication, conceptual design, spatial layout, material and furniture selection, and rendering3,27. Due to the complexity of interior design, even minor modifications often require the designer to repeat the entire design process. This linear workflow leads to repetitive design work, resulting in a decrease in design efficiency27. Furthermore, clients often request multiple design options for the same space in order to choose the most satisfactory design. This practice significantly increases the workload for designers, especially when they have to meet client demands within a limited time frame. As a result, designers often find themselves working overtime to meet deadlines.

Another significant challenge in interior design is the attainment of aesthetically pleasing designs1,4. Interior designers must factor in elements such as spatial layout, color harmony, material choice, furniture arrangement, and lighting design, among others. Designers require a blend of creativity, artistic sensibility, and practical understanding of the design. They need to perpetually research and master cutting-edge design trends and technologies to sustain their innovation and competitiveness, which poses further challenges3,6,7.

Hence, enabling designers to efficiently produce aesthetically pleasing interior designs in bulk is crucial in tackling the aforementioned challenges.

Text-to-image diffusion model

Diffusion models for image generation have recently gained substantial attention

Methodology

In recent years, significant breakthroughs have been made in the field of text-to-image generation using diffusion models. Models such as Dall\(\cdot \)E 224, Midjourney25, Stable Diffusion26, and Dreambooth40 have emerged as prominent image-generation models in the past few years. These models have demonstrated remarkable performance in various application scenarios. However, there is still potential for improvement in the performance of diffusion models, particularly in generating aesthetically pleasing interior designs with specified decoration styles. This is especially relevant in the field of interior design.

In this research, we propose an improved aesthetic diffusion model for generating batches of aesthetically pleasing interior designs. Our method comprises a self-created dataset called the aesthetic decoration style and spatial function interior dataset (i.e., ADSSFID-49), which includes information on aesthetic scores, decoration styles, and spatial functionality in interior design. Additionally, we introduce a novel composite loss function that combines aesthetic scores, decoration styles, and spatial functionality as loss terms. By fine-tuning the model using this dataset and loss function, we have achieved the capability to generate interior design models that are aesthetically pleasing and aligned with specified decoration styles and spatial functionality in bulk. This method enhances the practicality of diffusion models in the field of interior design, as designers can obtain corresponding design results by simply inputting their design requirements in text form. This method offers a fresh design method for interior designers.

The proposed method follows a four-stage process. The first stage involves establishing the dataset. In the second stage, a new loss function is designed. The third stage focuses on fine-tuning the model using the dataset and the new loss function. Finally, in the fourth stage, designers utilize the model to generate and modify designs according to their requirements.

In the data collection stage, to address the need for interior design datasets with aesthetic scores, this research involved professional designers gathering over 20,000 high-quality interior design images from renowned interior design websites. Next, we employed state-of-the-art aesthetic score models to automatically rate these images and mapped the score distribution to integers ranging from 1 to 10. Subsequently, professional designers annotated each image with decoration styles and spatial functionality. Through this process, we successfully established an interior dataset, named ADSSFID-49, which includes aesthetic scores, decoration styles, and spatial functionality annotations.

During the loss function construction phase, this study introduces a novel composite loss function. This function incorporates aesthetic scores, decorative styles, and spatial functions as additional losses (Eq. 2), building upon the foundation of the diffusion model\(^{\prime }\)s conventional loss function (Eq. 1): model training is primarily aimed at producing interior designs that exhibit predetermined aesthetic scores, decorative styles, and spatial functionalities. The model undergoes training to minimize the loss, thereby attaining the specified capabilities.

The basic diffusion model is given by Eq. (1):

$$\begin{aligned} {\mathbb {L}}_{\varvec{{Y}},\varvec{{h}},\varvec{{\epsilon }},\varvec{{t}}}[w_t||{\hat{Y}}_\theta (\alpha _tY+\sigma _t\epsilon ,\varvec{{h}})-Y||_2^2] \end{aligned}$$
(1)

In Eq. (1), \({\mathbb {L}}\) represents the average loss, and model training aims to decrease this value. A lower loss indicates better image-generation quality. \({\hat{Y}}_\theta \) refers to the evolving diffusion model that continuously receives a noisy image vector \(\alpha _tY+\sigma _t\epsilon \) and a text \(\varvec{{h}}\), and produces a predicted image. This predicted image is compared to the truth image Y, and the difference between them is quantified as the loss. The error between the predicted and ground truth images is measured using the squared loss. \(w_t\) is the weight parameter used to control the weight change of the diffusion model in different time periods. The external N represents the accumulation of losses from all the images, which is then divided by the total number of images to obtain the average loss per image. During training, the diffusion model adjusts its parameters to reduce the discrepancy between the generated and truth images, ultimately minimizing \({\mathbb {L}}\).

The composite loss function proposed in this research is given by Eq. (2):

$$\begin{aligned} {\mathbb {L}}_{\varvec{{Y}},\varvec{{h}},\varvec{{\epsilon }},\varvec{{\epsilon ^{'}}},\varvec{{t}}}[w_t||{\hat{Y}}_\theta (\alpha _tY+\sigma _t\epsilon ,\varvec{{h}})-Y||_2^2+\lambda {w_{t^{'}}}||{\hat{Y}}_\theta (\alpha _{t^{'}}Y_{pr}+\sigma _{t^{'}}\epsilon ^{'},\varvec{{h_{pr}}})-Y_{pr}||_2^2] \end{aligned}$$
(2)

The improved loss function (Eq. 2) addresses the limitations of the traditional diffusion model in generating aesthetically pleasing designs with different decoration styles. Equation (2) combines aesthetic score, decoration style, spatial functionality, and prior knowledge as components of the loss function, building upon Eq. (1). Equation (2) consists of two main components. The first component measures the discrepancy between the images generated by the trained model and the ground truth images. \({\hat{Y}}_\theta \) represents the new diffusion model, which incorporates aesthetic score, decoration style, and spatial functionality losses. The difference between the images generated by this model and the ground truth images Y contributes to the loss of the first component. The second component is the prior knowledge loss, which compares the images generated by the new diffusion model (i.e., \({\hat{Y}}_\theta (\alpha _{t^{'}}Y_{pr}+\sigma _{t^{'}}\epsilon ^{'},\varvec{{h_{pr}}}\))) with those generated by the pre-trained diffusion model (i.e., \(Y_{pr}\)). A smaller difference between these images indicates that the newly trained model retains the general knowledge of the base model. \(\lambda {w_{t^{'}}}\) is a weight that can be automatically learned to adjust the contributions of these two components, aiming to achieve better generation results. The combination of the first and second component losses allows the new diffusion model to retain the general knowledge of the pre-trained model while learning aesthetic, decoration style, and spatial functionality knowledge. As a result, the fine-tuned diffusion model can generate aesthetically pleasing interior designs with specified decoration styles and spatial functionality.

Using Stable Diffusion V1.5 as the foundational model in the fine-tuning phase of the diffusion model and as a baseline for subsequent qualitative and quantitative comparisons. We utilized the ADSSFID-49 dataset to fine-tune the improved diffusion model. Specifically, the improved diffusion model employed a new composite loss function to learn from this dataset, continuously reducing the loss during training. This allowed the model to acquire knowledge in respect of the aesthetic score, decoration style, and spatial functionality, resulting in a new diffusion model for aesthetic interior design, known as the aesthetic interior design diffusion model (i.e., AIDDM).

During the model utilization stage, designers can use the AIDDM for design generation and modification. In the design generation phase, users only need input textual descriptions of their desired decoration style and spatial functionality to generate an interior design. This method allows for the rapid and batch generation of interior designs with different decoration styles for users. Compared to traditional methods, our proposed method eliminates cumbersome workflow steps such as drawing 2D plans, creating 3D models, texturing, and rendering, thereby significantly improving design efficiency. Traditional design processes often take several days to complete a single design. In contrast, our method can generate a design in approximately two seconds on a computer with 24 GB of graphics memory, resulting in around 30 designs per minute. In the design modification stage, our method only requires changing the design prompts to regenerate a design without repeating the entire design process. Therefore, it offers advantages in optimizing the design workflow and enhancing design efficiency. Our method reduces the difficulty of creative design by generating designs in bulk, thus accelerating the design decision-making process. Figure 4 illustrates the differences between traditional design methods and our proposed method.

Figure 4
figure 4

Comparison of the design process between different design methods. Conventional methods in the design stage require drawing 2D plans, creating 3D models, applying materials to the models, and rendering visualizations. In contrast, our method only requires textual descriptions to generate design visualizations directly. In the modification stage, conventional methods require repeating the entire design process, while our method only requires modifying the textual prompts to regenerate the design.

Experiments and results

Implementation details

The diffusion model was trained on a computer with a Windows 10 operating system. The computer had 64GB of RAM and used an NVIDIA 3090 graphics card with 24 GB of memory. The training software used was PyTorch, with each image undergoing 100 iterations. The image preprocessing method involved the computer resizing the input images proportionally to a maximum resolution of 512 pixels on the longer side. Data augmentation was performed using horizontal flip**. The model’s learning rate was set to 0.000001, and the batch size was set to 24. Xformers and FP16 were utilized for accelerated computations. The total training time for fine-tuning the diffusion model was 20 hours.

ADSSFID-49 dataset

This research aimed to generate a large quantity of aesthetically pleasing and specified decoration-style interior designs using text. Due to the lack of interior datasets with aesthetic scores, this study created the aesthetic decoration style and spatial function interior dataset (ADSSFID-49). Expert interior designers curated this dataset from reputable websites such as “3d6646,” “om47,” and “znzmo48.” Initially, the designers procured over 40,000 free, high-quality images from these sources. Subsequently, they meticulously evaluated each image, excluding those displaying incongruent decoration styles or unclear details. A stringent selection process was followed, and more than 20,000 images aligned with the established criteria. Furthermore, designers manually annotated the decoration styles and spatial functionalities depicted in these images. Ultimately, employing an open-source aesthetic evaluation mode5.

Figure 5
figure 5

Distribution of aesthetic scores on the ADSSFID-49 dataset. The dataset uses an aesthetic scoring model to label the aesthetic score of each image automatically and maps all the scores to integers between 1 and 10, conforming to a normal distribution through a normalization method.

We enlisted the expertise of professional designers to manually annotate the decoration styles and spatial functions of the ADSSFID-49 dataset. The decoration style annotations encompass seven categories: “Contemporary style”, “Chinese style”, “Nordic style”, “Japanese style”, “European style”, “Industrial style”, and “American style”. The spatial function annotations also consist of seven categories: “Children’s room”, “Study room”, “Bedroom”, “Bathroom”, “Living room”, “Dining room”, and “Kitchen”. The distribution of the different categories of images is shown in Table 1.

Table 1 Image distribution of each decoration style and spatial function corresponding to the ADSSFID-49 dataset.

From Table 1, we can observe that in the ADSSFID-49 dataset, when sorted by a decoration style, the “Contemporary style” has the highest number of images (5153 images), while the “Japanese style” has the fewest (2108 images). When sorted by spatial function, the “Living room” category has the highest number of images (5161 images), while the “Kitchen” category has the fewest (1490 images). In total, there are 22,403 images in the dataset. Figure 6 shows some training data samples.

Figure 6
figure 6

ADSSFID-49 training data sample display.

Evaluation metrics

The evaluation of interior design involves subjective and objective assessments. Typically, conventional objective evaluation methods employ computerized techniques to assess image clarity and compositional coherence. However, considering that our focus in interior design evaluation is not solely on image clarity or compositional coherence but on the aesthetic appeal of the generated interior designs, the consistency of decoration styles, and the rationality of spatial functions, these aspects require subjective evaluations by professional designers. Therefore, we did not employ conventional objective evaluation methods50,51.

Assessing generative architectural design images poses a significant challenge. Traditional automated image evaluation methods fail to evaluate the design content effectively50,51 . Consequently, this study invited experienced industry designers to collaboratively discuss and formulate a series of evaluation metrics tailored for professional interior design. These metrics encompass eight categories: “aesthetically pleasing,” “decoration style,” “spatial function,” “design details,” “object integrity,” “object placement,” “realistic,” and “usability.”

Subsequently, the content and significance of the evaluation metrics designed herein are elucidated. We identify “aesthetically pleasing” and “usability” as the pivotal indicators. Specifically, “aesthetically pleasing” signifies that the generated design possesses aesthetic appeal, a crucial interior design aspect. The “usability” metric indicates that upon a comprehensive observation of the generated image, no apparent errors are observed, thus validating the image’s usability. For the other indicators, “decoration style” refers to the consistency between the generated interior design’s decorative style and the provided cues. “Spatial function” pertains to the appropriateness of the generated space size and its alignment with the described spatial functions. “Design details” denote the richness and complexity of design elements in the generated image. “Object integrity” ensures the absence of defects in the generated objects. “Object placement” evaluates the rationality of the generated furniture positioning. Finally, “realistic” indicates that the generated image closely resembles a photograph taken by a camera. These evaluation metrics enable a comprehensive assessment of the design quality and show the practical value of the generated interior design.

Visual assessment

In this research, we visually compared our diffusion model with other popular diffusion models. We selected several mainstream diffusion models for comparison, including Disco Diffusion52, Dall\(\cdot \)E 2

Conclusions

Traditional interior design methods require designers to possess aesthetic awareness and professional knowledge while also dealing with tedious design tasks, leading to difficulties in achieving aesthetically pleasing designs and low design efficiency. To address these challenges, we proposed the aesthetic diffusion model. By allowing designers to input text descriptions, this model can generate a batch of visually pleasing interior designs, transforming the labor-intensive design process into a computer-generated one. To overcome the problem of limited training data, we first created an interior design dataset annotated with aesthetic labels. Then, a composite loss function was proposed that incorporates aesthetic scores, interior decoration styles, and spatial functions into consideration of the loss. Subsequently, the model was retrained using this dataset and the new loss function. Through this training, the model can generate aesthetically pleasing interior designs in batches based on text descriptions while also being able to specify the decoration style and spatial function of the design. Experimental results demonstrate that the proposed method in this research can, to a certain extent, replace the laborious creative design and drawing tasks required in traditional design, transforming the design process and significantly improving design efficiency.

This research also has some limitations. Firstly, for the generated interior designs, it is challenging to establish comprehensive quantitative evaluation metrics. Currently, we have referred to the literature and expert opinions to formulate some quantitative evaluation indicators, but further development of more evaluation dimensions is needed to achieve a more quantified assessment of subjective perceptions. Specifically, there is a need for the development of automated evaluation algorithms and benchmarks to achieve this goal. Secondly, our understanding of decoration styles may be limited by personal cultural influences, which may prevent us from fully objectively understanding the decoration styles of other countries. It would be beneficial to involve more designers from diverse cultural backgrounds in the data collection process to reduce cultural bias. Additionally, there is room for improvement in the level of detail in the generated images. More design details can be obtained by increasing the amount of training data and using higher image resolutions for training.

Future research can explore the following directions:

  1. 1.

    Hiring designers with diverse cultural backgrounds to establish a more comprehensive aesthetic interior design dataset to reduce the impact of cultural bias on the understanding of decoration styles.

  2. 2.

    In addition to relying solely on text guidance for image generation, it is possible to incorporate additional control mechanisms to achieve more precise control over the generated image results in the aesthetic diffusion model.

  3. 3.

    Researching the accuracy of dataset annotations. Currently, dataset annotations are often performed using either automated or manual methods, both of which have limitations. By combining these two annotation methods, the quality of the dataset can be improved, thereby enhancing the final generated design results.

  4. 4.

    Interior design evaluation relies heavily on manual assessments of aesthetics, decoration styles, and spatial functions. There is a pressing need to develop automated quantitative evaluation methods for assessing the generated interior designs.