Sequence aware recommenders for fashion E-commerce

Kim, Yang Sok; Hwangbo, Hyunwoo; Lee, Hee Jun; Lee, Won Seok

doi:10.1007/s10660-022-09627-8

Sequence aware recommenders for fashion E-commerce

Open access
Published: 11 November 2022

(2022)
Cite this article

Download PDF

You have full access to this open access article

Electronic Commerce Research Aims and scope Submit manuscript

Sequence aware recommenders for fashion E-commerce

Download PDF

Yang Sok Kim¹,
Hyunwoo Hwangbo ORCID: orcid.org/0000-0001-6649-2587²,
Hee Jun Lee¹ &
…
Won Seok Lee¹

2577 Accesses
4 Citations
Explore all metrics

Abstract

In recent years, fashion e-commerce has become more and more popular. Since there are so many fashion products provided by e-commerce retailers, it is necessary to provide recommendation services to users to minimize information overload. When users look for a product on an e-commerce website, they usually click the product information sequentially. Previous recommenders, such as content-based recommenders and collaborative filtering recommenders, do not consider this important behavioral characteristic. To take advantage of this important characteristic, this study proposes sequence-aware recommenders for fashion product recommendation using a gated recurrent unit (GRU) algorithm. We conducted an experiment using a dataset collected from an e-commerce website of a Korean fashion company. Experimental results show that sequence aware recommenders outperform non-sequence aware recommender, and multiple sequence-based recommenders outperform a single sequence-based recommender because they consider the attributes of fashion products. Finally, we discuss the implications of our study on fashion recommendations and propose further research topics.

Reusable Self-attention-Based Recommender System for Fashion

Deep Neural Network Incorporating CNN and MF for Item-Based Fashion Recommendation

Modeling Contextual Changes in User Behaviour in Fashion e-Commerce

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An increasing number of fashion consumers now purchase fashion products from online platforms. Forrester research predicts the size of the e-commerce market will amount to USD $911 million, about 36% of global fashion retail sales by 2022, making fashion e-commerce the world’s largest e-commerce market [1]. In order to achieve a competitive advantage, fashion e-commerce companies are continuously seeking out innovative technologies.

On e-commerce websites, users aim to find fashion products that satisfy their fashion preferences. Users usually cannot easily locate specific fashion items because e-commerce websites list a large number of items. In general, e-commerce websites offer a variety of solutions to help users find items of interest. Recommendation systems are one of these technologies [2, 3]. In an e-commerce environment, recommendation system typically learns a recommendation model by utilizing user behaviors, such as purchasing and clicking on items on an e-commerce site. Then using this model, the recommender proposes users with items that are predicted to be highly preferred by users. Collaborative filtering-based recommenders (CFR) and content-based recommenders (CBR) are the most conventional recommendation development approaches [25]. CFRs also try to include attributes of fashion items in creating recommendations [25, 26]. CFRs and CFRs’ use of attributes implies that they are better suited than other recommendation systems for including fashion attributes when building recommendation models.

When a user selects fashion products on a web page, they usually select fashion products one after another. This implicitly shows a preference for each selected fashion product. ‘Coordinates’ are one of the major concepts used in recommendation systems for fashion items [15, 22, 23, 27]. This implies that there are significant sequential patterns in user preference activity. In order to reflect the sequential behavioral patterns in which users interact with fashion products on a web site, we propose recommenders that utilize them to suggest recommendations.

However, both CFRs and CBRs do not include sequence patterns of items and attributes created by users’ behaviors. Some temporal recommenders perform better than conventional CFRs and CBRs because they include a time factor when building recommendation models [28]. The temporal recommender aims to provide recommendations to users at a suitable time and thus it uses time in making the final decision to obtain accurate predictions [28]. Early temporal recommenders only consider time factor to evaluate the importance of user behaviors with items. However, many recent researches have focused on directly modeling the sequential structure, considering time factors [29]. This will be discussed in more detail in the next section.

2.2 SAR with RNN

Recently, deep learning has been adopted to expand SARs due to its significant success in sequence modeling [30,31,32]. Bernhardsson [33] attempts to use RNN to predict the next track a user may want to play on a music streaming site. Hidasi et al. [14] suggest a SAR which uses Gated Recurrent Unit (GRU). GRU focuses on the vanishing gradient problem and controls the internal state of RNN using one or more small neural networks called gates. GRU gates learn the update time and degree of the hidden state. Devooght and Bersini [13], Devooght and Bersini [34], Wu et al. [35] adopt a unidirectional single layered Long Short-Term Memory (LSTM) to develop a SAR. They demonstrate that the proposed SAR outperforms other recommendation methods, such as k-NN and MCR, and excels in short term prediction and item coverage. Table 1 summarizes the basic ideas and characteristics of RNN, LSTM, and GRU used in the development of SARs.

Table 1 Comparison of RNN, LSTM and GRU

Full size table

Tan et al. [39] improve SARs by considering data augmentation and temporal shifts in user behaviors. Jannach and Ludewig [40] combine GRU4REC proposed by Hidasi et al. [14] with k-NN based CFRs and demonstrate that this hybrid approach outperforms other recommenders. These studies support the conclusion that SARs outperform conventional recommenders. We adopted the GRU4REC proposed by Hidasi et al. [14], because they solve the vanishing gradient problem and because it is suitable for SAR development for fashion products, which usually have long sequence patterns.

2.3 SAR with multiple sequence datasets

If several types of sequence datasets can be obtained from the problem domain, it is desirable to combine them for the development of the recommenders. Based on this intuition, several studies have attempted to improve SARs using multiple sequence datasets, named MSARs. Zhang et al. [41] use RNN for sequential click prediction for sponsored searches of online search engines. They construct datasets by considering advertising features (e.g., advertising identifier, location of advertising display and advertising text related to query), user features (e.g., user identifier and query), and sequential features (e.g. time interval since last impression). Liu et al. [42] propose a MSAR which combines heterogeneous attributes and item sequence data. They propose a hierarchical attribute combination method to handle text attributes of varying lengths. They also propose an MSAR named Heterogeneous Attribute RNN, which learns recommendation models using item and attribute sequence datasets. Smirnova and Vasile [43] suggest a context sequence modelling method for MSAR that combines context embeddings and item embeddings by parameterizing hidden unit transitions as a function of context information. They show that their proposed method improves the prediction of the next event. Wu et al. [44] propose a MSAR that combines ratings, text reviews, and temporal patterns. They used an IMDb dataset to prove that their proposed model exhibits better constellations than conventional methods. Hidasi et al. [45] aim to combine image and text sequence datasets with item sequence datasets. They propose two different approaches —feature concatenation and parallel processing methods. The feature concatenation approach concatenates different sequence datasets and them uses them to train the model of MSAR, the parallel processing approach leverages individual datasets to learn MSARs and combines them using weighted aggregation methods.

Fashion products have various attributes. Previous studies have utilized these attributes to develop CBRs. This study aims to reflect the attributes of fashion products in the development of MSARs. We generate multiple attribute sequence datasets using the fashion item click history. To develop multiple sequence-based MSARs for fashion products, this study used the concepts proposed by Hideasi et al. [45].

3 Method

3.1 Data preparation

The sequence dataset used in our recommenders are generated from a Korean e-commerce website referred to as Store K in this paper. Store K sells various fashion products manufactured by a leading Korean fashion manufacturer. The sequence dataset is created as follows:

First, we collected log data from the operations database that stores what fashion products users clicked on Store K. The log data consists of a user identifier, a product identifier, and a click time. An item session dataset is created using this log data, where a session consists of products that one user clicks sequentially during the day. Within a session, the product click logs are sorted in ascending order by click time. Because sequence data requires one input and one corresponding output, the session must have at least two click records. The length of sessions varies because the number of clicks per day is different for each user, and the same user also varies from day to day.

Second, attribute session datasets are created using item session data. Because each product has attributes such as brand, season, category, price, and product planning group, session datasets for each attribute are generated by arranging the relevant attributes of the products in the product session dataset. In this way, multiple session datasets are generated for each attribute of the product considered in this study.

Finally, session parallel mini-batches are generated from session datasets following the method proposed by Hidasi et al. [45]. Figure 1 shows how session parallel mini-batches are generated from five session data as an example. The first $N$ session elements (items or attributes) of session dataset, referred to as mini-batch size, are selected to construct the input and output of the first mini-batch, where the output is the next elements of session dataset. In the example of Fig. 1, the mini-batch size is 3 and the first mini-batch is generated using the first elements of ${Session}_{1}$, ${Session}_{2}$, and ${Session}_{3}$. In this mini-batch, inputs are ${\text{a}}_{\text{1,1}}$, ${\text{a}}_{\text{2,1}}$, and ${\text{a}}_{\text{3,1}}$, and outputs are ${\text{a}}_{\text{1,2}}$, ${\text{a}}_{\text{2,2}}$, and ${\text{a}}_{\text{3,2}}$. The next mini-batch is formed by putting the next elements of each session of the previous mini-batch. If any session does not have the next element, the next available session is put in its place. For example, when creating mini-batch 3 in Fig. 1, ${\text{a}}_{\text{2,3}}$ has no next element to be entered as the output. In this case, the next session, ${Session}_{4}$, comes in instead. It is assumed that the sessions are independent and thus the appropriate hidden states are reset when this switch occurs. In this way, mini-batches are generated by utilizing the product sequence and attribute sequence datasets. We used attribute session data because we expected that combining the product with attribute session data would improve the recommender’s performance. In particular, we can solve the cold start problem by using attribute session data to suggest recommendations.

3.2 GRU

This research uses a special RNN approach called GRU [46]. GRU uses update and reset gates to solve the vanishing gradient problem of standard RNN. These two gates basically decide what information should be passed to the output.

The update gate (${\text{z}}_{t}$) for time step $t$is calculated by:

$${\text{z}}_{t} = ~\sigma \left( {{\text{W}}_{z} x_{t} + {\text{U}}_{z} {\text{h}}_{{t - 1}} } \right)$$

(1)

where $\sigma$ is a smooth and bound function such as a logistic sigmoid function, ${x}_{t}$ is the input of the unit at time step t, ${W}_{z}$ is the weights for ${x}_{t}$, ${h}_{t-1}$ is the previous activation function, ${U}_{z}$ is the weights for ${h}_{t-1}$. The update gate helps the model to determine how much of the past information from previous time step needs to be passed along to the future step. This is the reason why sigmoid is used as the active function of the update gate.

The reset gate (${r}_{t}$) for time step $t$is calculated by:

$${\text{r}}_{t} = \sigma ({\text{W}}_{r} x_{t} + {\text{U}}_{r} {\text{h}}_{{t - 1}} )$$

(2)

where $\sigma$ is a smooth and bound function such as a logistic sigmoid function, ${x}_{t}$ is the input of the unit at time step$t$, ${W}_{r}$ is the weights for${x}_{t}$, ${h}_{t-1}$ is the previous activation function, ${U}_{r}$ is the weights for ${h}_{t-1}$. The reset gate is used from the model to decide how much of the past information to forget.

The candidate activation function (for ${h}_{t-1}$.) for time step t is calculated by:

$${\hat{\text{h}}}_{t} = {\text{tanh}}({\text{W}}x_{t} + r_{t} \odot {\text{Uh}}_{{t - 1}} )$$

(3)

where tanh is the nonlinear activation function, ${x}_{t}$ is the input of the unit at time step $t$, $W$ is the weights for ${x}_{t}$, ${h}_{t-1}$ is the previous activation function, $U$is the weights for ${h}_{t-1}$. $r_{t} \odot Uh_{{t - 1}}$ part calculates the Hadamard (element-wise) item between the reset gate and $U{h}_{t-1}$. Finally, the activation of the GRU is calculated by:

$${\text{h}}_{i} = \left( {1 - z_{t} } \right){\text{h}}_{{t - 1}} + z_{t} {\hat{\text{h}}}_{t}$$

(4)

3.3 SAR architectures

This work proposes four SAR architectures with GRU layers as described in Fig. 2. Figure 2(a) shows the Item-GRU4REC (I- GRU4REC) architecture, which is built by using an item sequence dataset. The mini-batches of the item sequence dataset are provided to the embedding layer, and the GRU layer processes the output of the embedding layer. The SoftMax layer combines the outputs of the GRU layer to generate a set of candidate recommendation items. Negative sampling and ranking loss calculations are performed to evaluate the correctness of the recommendations. Finally, the weights of the embedding layer, the GRU layer, and the SoftMax layer are updated for the next iteration.

Figure 2(b) illustrates an Attribute-GRU4REC (A-GRU4REC) architecture constructed with attribute sequence datasets. Multiple mini-batches of attribute sequence datasets (AttSeq 1, AttSeq 2, and AttSeq N) are processed by the multiple embedding layers (Embedding Layer1, Embedding Layer2, and Embedding Layer N). The outputs of the embedding layers are combined by the feature combining layer. The output of the feature combining layer is processed by the GRU layer and then processed by SoftMax layer to generate recommendations.

Figure 2(c) shows a Concatenate MSAR (C-MSAR) architecture that concatenate the item and attribute sequence datasets from the input. Although A-GRU4REC considers only a combination of attribute sequence data, C-MSAR attempts to combine attribute sequence data as well as item sequence data. The concatenated embeddings are processed by the GRU layer. The output of the GRU layer has N + 1 result, where N is the number of attributes. The output of the GRU layer is forwarded independently to each SoftMax layer, and the SoftMax combining layer linearly combines N + 1 SoftMax results.

Figure 2(d) shows a Parallel MSAR (P-MSAR) architecture. This architecture builds I-GRU4REC and A-GRU4REC independently using item and attribute sequence datasets. The weight combining layer learns the optimal combination of item and attribute weights. Before learning the optimal combination of weights, I-GRU4REC and A-GRU4REC are trained with optimal parameters on the first instance.

4 Experiment

4.1 Dataset

We collected item click log data consisting of user identifier, item identifier, and date-time from Store K for four weeks. Using this data, we constructed session datasets for the experiment as summarized in Table 2. First, we constructed a dataset for parameter setting using full click log data, which consisted of 52,556 sessions with 18,637 items. With the click log data for the first three weeks, we constructed a training dataset, consisting of 40,371 sessions with 17,525 items. We then constructed the first test dataset, named Test I, with a week of click data. The sessions in Test I include new items that are not included in the training items. Test I dataset consists of 9,137 sessions with 11,310 items. We also constructed a second test dataset, named Test II, with click log data for the last week. The sessions in Test II do not include new items that are not included in the training items. Test II dataset consist of 3,048 sessions with 11,267 items. Training, Test I, and Test II datasets are used to compare different SARs.

Table 2 Dataset summary

Full size table

The fashion products used in this research have five attributes – brand, season, category, price level and production planning group. Fashion products are provided by 20 fashion brands owned by the e-commerce company. There are a total of 192 product categories for five seasons (spring, summer, fall, winter and non-seasonal). We assumed that people tend to buy products that belong to different price levels. To include the price, we used categorized values for prices instead of actual numerical values. The price of fashion products is classified into 10 levels using the equal width binning approach within the same product category because products in different categories have different price ranges. Product planning groups are assigned according to what year and season the product was planned. There are a total of 175 production planning groups. The attributes used in this study and the number of values for each attribute are summarized in Table 3.

Table 3 Feature Summary

Full size table

4.2 Baseline recommender

We compared the proposed SARs with k-KNN CFR, which recommends similar items measured by cosine similarity between vectors in sessions. For the given two sessions A and B, the cosine similarity ($cosine\_smilairty(\text{A},\text{B})$) is defined as the division between the product of session vectors and the product of the Euclidean norms of each session vector.

$$ cosine\_smilairty (\text{A},\text{B}) = \frac{{\sum\nolimits_{{i = 1}}^{n} {{\text{A}}_{i} } {\text{B}}_{i} }}{{\sqrt {\sum\nolimits_{{i = 1}}^{n} {\left( {{\text{A}}_{i} } \right)^{2} } } \sqrt {\sum\nolimits_{{i = 1}}^{n} {\left( {{\text{B}}_{i} } \right)^{2} } } }}$$

k-KNN CFR is one of the most common item-to-item solutions in practical recommenders and provides recommendations in the “others who viewed this item also viewed these ones” setting. Despite its simplicity, k-KNN CFR is generally considered a reliable reference recommender for comparing performance of various recommenders [47, 48].

4.3 Performance measures

Because each session in the test dataset has multiple sequence items, multiple evaluations are performed for each session. For a particular session in the test dataset, for its ${i}^{th}$ session item, the recommender proposes Top-N recommendations, where the recommendations are sorted by predicted preferences, so the most preferred recommendations expected to be preferred come first. If the ${i}^{th}+1$ session item is in the recommendations, the recommendation is considered successful. Otherwise, it is considered a failure. Figure 3 shows an example in which the recommender proposes the Top- N recommendations (${r}_{x,1}$, ${r}_{x,2}$, … and ${r}_{x,N}$) based on the sequence of items (${a}_{x,1}$, ${a}_{x,2}$, and ${a}_{x,3}$). If the next item in the session (${a}_{x,4}$) is in the Top- N recommendations, the recommendation is considered successful; otherwise it is considered unsuccessful.

Since we focused on the number of successful recommendations within the test, we selected Hit ratio and Mean Reciprocal Rank (MRR) with the Top-N recommendations to measure the performance of the recommender. Hit ratio measures the percentage of successful recommendations among the Top-N recommendations provided. Therefore, Hit ratio is defined as:

$${\text{Hit}}\;{\text{Ratio}} = \frac{{\left| {{\text{hitset}}} \right|}}{{\left| {{\text{test}}} \right|}}$$

(5)

where $\left| {{\text{test}}} \right|$ is the number of test sessions and $\left| {{\text{hitset}}} \right|$ is the number of successful recommendations.

MRR measures the performance of recommendations by considering the ranking of recommended items. Hit ratio simply measures whether the test item exists within the recommendations, whereas MRR examines which ranking the test item matches the recommended item. MRR is defined as:

$${\text{MRR}} = \frac{1}{N}\sum\limits_{{i = 1}}^{N} {\frac{1}{{{\text{rank}}_{i} }}}$$

(6)

where ${rank}_{i}$ represents the $i{\text{th}}$ rank.

Performance evaluations for each session are performed one less than the length of the session. For example, if a particular session has four session items, as shown in Fig. 3, Hit ratio and MRR are calculated three times, excluding the last sequence item. Finally, the overall performance is measured by averaging Hit ratio and MRR results for all sessions in the test dataset.

5 Results

5.1 Parameter settings

It is important to set the appropriate parameters for SARs. This section describes experimental results based on different parameter settings. SARs have six parameters: learning rate, loss function, hidden activation function, final activation function, dropout and RNN size. In a hidden activation function, tanh exhibits consistently high performance. Therefore, it has been continuously used in the following parameter settings without further discussion.

5.1.1 Parameter settings for Item-SSAR

We consider learning rate, loss function, final activation function and dropout as the hyperparameters for I-GRU4REC. The learning rate compares the two ratios – 0.001 and 0.0005. Bayesian Personalized Ranking (BPR) [49] and TOP1 [14] are considered as loss function. BPR is a matrix factorization method using pairwise ranking loss, whereas TOP1 is a regularized approximation to the relative rank of the relevant items. Linear and Rectified Linear Unit (ReLU) are considered for the final activation function. Finally, dropout is set to 0.5 for all comparisons. The performance is evaluated using Hit Ratio and MRR of the Top-N recommendations. The evaluation results are summarized in Table 4. Our evaluation results show that I-GRU4REC performs best when the learning rate is 0.0005, the loss function is TOP1, the RNN size is 1,000 and the final activation function is Linear. In the top 5 recommendations, the Hit Ratio is 29.20% and the MRR is 20.30%. Meanwhile in the top 20 recommendations, the Hit Ratio is 43.50% and the MRR is 21.80%.

Table 4 Experimental results of I-GRU4REC

Full size table

5.1.2 Parameter settings for attribute embedding size

Because the number of values of each attribute is different, the appropriate embedding size for each attribute must be determined before modeling A-GRU4REC. After setting the learning rate to 0.001, the dropout to 0.5, the loss function to BPR, and the final activation to Linear, Hit Ratio and MRR are measured when different embedding sizes and different numbers of recommendations are provided for each attribute. Because the number of values of each attribute is different, the number of evaluations for attributes is different. The evaluation results are summarized in Table 5, where the best performing embedding size is shown in bold font.

Table 5 Experimental results on Attribute Embedding Size

Full size table

5.1.3 Parameter settings for A-GRU4REC

We select the appropriate size of the embeddings of attributes, and then compare the performance of various A-GRU4RECs with different loss functions (BPR and TOP1), final activation functions (Linear and ReLU), learning rates (0.001 and 0.0005), and dropouts (0.2, 0.5, and 0.8). The experimental results are summarized in Table 6, where Hit Ratio and MRR are measured at the Top-N recommendations, and the best settings are marked by bold font. The results show Hit Ratio and MRR are optimal when the loss function is BPR, the final activation function is Linear, the learning rate is 0.001, and the dropout is 0.5. For the Top 5 recommendations, Hit Ratio is 9.90% and MRR is 5.20% and for the Top 20 recommendations, Hit Ratio is 20.80% and MRR is 6.30%.

Table 6 Experimental results of A-GRU4REC

Full size table

5.2 Performance comparison of recommenders

We performed performance comparisons using Training, Test I, and Test II datasets. Table 7 summarizes the experimental results using Test I dataset, which contains new items that are not in Training dataset. Experimental results show that all SARs proposed in this study significantly outperform conventional k-NN CFR. When the top 5 items are recommended, Hit Ratio of k-NN CFR is only 3.5%, while that of I-GRU4REC, A-GRU4REC, C-MSAR, and P-MSAR are 14.8%, 8.2%, 15.1% and 17.4% respectively. Although SARs show significantly higher performance, Hit Ratio improvements decrease as the number of recommendations increases. Similarly, I-GRU4REC, A-GRU4REC, C-MSAR, and P-MSAR exhibit consistently higher MRR than k-NN CFR. When the top 5 items are recommended, MRR of k-NN CFR, I-GRU4REC, A-GRU4REC, C-MSAR, and P-MSAR are 5.1%, 9.5%, 4.2%, 9.8%, and 10.5%, respectively. MRR improvements also decrease as the number of recommendations increases. Among the various SARs, P-MSAR shows the highest improvement in Hit Ratio and MRR. I-GRU4REC based solely on a single item sequence dataset performs much better than A-GRU4REC based on a multiple attribute order dataset. This seems to mean that the effect of item sequences on SAR improvement is greater than that of attribute sequences. However, this does not mean that considering an attribute sequence is completely meaningless. In practice, C-MSAR and P-MSAR, which consider item sequences and attribute sequences together, perform better than I-GRU4REC.

Table 7 Recommendation results of test I

Full size table

Table 8 summarizes the experimental results when new items are excluded in the testing dataset (Test II). In the top 5 recommendations with Test II, Hit Ratio and MRR of k-NN CFR are higher than those in Test I. In addition, Hit Ratio and MRR of all RNN-SARs using the Test II dataset also show significant improvements. When the top 5 items are recommended, Hit Ratio of k-NN CFR is only 8.1%, while that of I-GRU4REC, A-GRU4REC, C-MSAR, and P-MSAR is 23.7%, 12.1%, 24.7% and 27.8% respectively. MRR of k-NN CFR is only 5.1%, while that of I-GRU4REC, A-GRU4REC, C-MSAR, and P-MSAR is 5.1%, 9.5%, 4.2%, 9.8%, and 10.5% respectively. Hit Ratio and MRR have been significantly improved when the Test II dataset is used for the experiment, but the degree of improvement in performance is lower than that of Test I.

For example, the Hit Ratio of P-MAR in Test I is 5.97 times higher than that of k-NN CFR, while the Hit Ratio of P-MSAR in Test II is 3.43 times higher than that of k-NN CFR. As with Test I, among different SARs, P-MSAR shows the highest improvement in Hit Ratio and MRR, and P-MSAR show significant improvement over other MSARs. The performance improvement is largely due to item sequences, because I-GRU4REC has significantly higher performance than A-GRU4REC. However, as with the above experimental results, C-MSAR and P-MSAR using both item and attribute sequences perform better than I-GRU4REC.

Table 8 Recommendation results of test II

Full size table

6 Discussion

This research aims to propose SARs for fashion product recommendation. Unlike existing recommendation algorithms, such as CFRs and CBRs, SARs explicitly consider the sequence of user behaviors when constructing recommenders. The sequences of user behavior implicitly reflects the temporal aspect of user behavior, similar to temporal recommenders [28, 50]. While temporal recommenders use temporal information to weight the ratings, SARs focus on the sequence of user behavior. Most SARs in e-commerce use a single sequence dataset of items. This study proposes three SARs, namely A-GRU4REC, C-MSAR and P-MSAR, using multiple sequence datasets.

This research demonstrates that P-MSAR outperforms k-NN CFR and other SARs. All SARs outperform k-NN CFR in Hit Ratio and MRR, which is consistent with the previous studies [51,52,53,54]. Even A-GRU4REC, which shows the lowest performance among SARs, achieves significantly higher performance compared to k-NN CFR. Since CFR generally outperforms CBR, it is an interesting result in that A-GRU4REC using only attribute sequences can present better recommendations for cold-start items than CFR as well as CBR.

I-GRU4REC significantly improves performance. Its performance is almost twice as high as that of A- GRU4REC. This result is consistent with the fact that CFR generally outperforms CBR. However, it should be noted that I-GRU4REC cannot suggest recommendations for cold-start items, which do not exist in the training dataset. This may suggest that there is room for performance improvement using hybrid SARs. The performance of I-GRU4REC is optimal when there is no new item in the test dataset (Test II). This is natural because I-GRU4REC can only provide recommendations for items that exist in the training dataset. Interestingly, the performance of A- GRU4REC is also enhanced when there are new items in the test dataset (Test II).

Both C-MSAR and P-MSAR combine item and attribute sequences and outperform k-NN CFR, I-GRU4REC and A- GRU4REC. Improvement is more significant when using P-MSAR as a recommendation. This result is consistent with the previous studies using attributes in fashion recommendation [15, 18, 23, 55, 56].

7 Conclusion

This study advocates SARs for recommendation in the e-commerce domain, particularly with regard to the fashion product recommendation problem. Consistent with previous studies, I-GRU4REC outperforms existing recommender, k-NN CFR. However, I-GRU4REC has inherent limitations because it cannot suggest recommendations for new items because there is no existing sequence data for new items. Therefore, this study attempted to combine item and attribute sequence data. Our experimental results show that our proposed hybrid SARs, C-MSAR and P-MSAR, outperform I-GRU4REC as well as k-NN CFR. This result clearly confirms that SARs using item and attribute sequence data can improve recommendation performance.

Despite this successful conclusion, our current approach has the following limitations: First, although our work has successfully demonstrated that SARs is suitable for fashion product recommendations, additional efforts are required to apply SARs in the real world. The attributes we consider are attributes of fashion products, but there is a limitation that the unique design attributes of fashion products (e.g., color, style, etc.) are not reflected. As suggested in previous studies [16,17,18, 23, 57], these visual attributes can be used importantly in recommending fashion products. Therefore, it is necessary to develop SARs including them. Second, this study proved the usefulness of SARs based on offline evaluation using historical data, but user evaluation in which users participate is required for more reliable performance evaluation. Furthermore, simulation through actual e-commerce sites is needed. The proposed SARs are recommended by considering short-term behaviors instead of long-term preferences due to the nature of RNNs. Therefore, it is necessary to explore ways to reflect long-term preferences. Finally, our current model considers only the next item in the sequence, but the current sequence can affect subsequent sequences. For this problem, attention RNN algorithms are promising and have already been applied to other domains [6, 58, 59]. Therefore, it is possible to extent the study by adopting attention RNN algorithms.

References

Meena, S. (2018). Forrester Analytics: OnlineFashion Retail Forecast, 2017 To 2022 (Global). Forrester
Stan, C., & Mocanu, I. (2019). An intelligent personalized fashion recommendation system, In: The 22nd International Conference on Control Systems and Computer Science (CSCS 2019). IEEE. 210–215
Ay, B., & Aydin, G. (2021). Visual similarity-based fashion recommendation system. Generative Adversarial Networks for Image-to-Image Translation (pp. 185–203). Elsevier
Glauber, R., & Loula, A. (2019). Collaborative filtering vs. content-based filtering: differences and similarities ar**v preprint ar**v:1912.08932,
Quadrana, M., & Cremonesi, P. (2018). Sequence-aware recommendation, in The 12th ACM Conference on Recommender Systems. ACM: Vancouver, British Columbia, Canada. p. 539–540
Yuan, W., et al. (2020). Attention-based context-aware sequential recommendation model. Information Sciences, 510, 122–134
Article Google Scholar
Hwangbo, H., & Kim, Y. (2019). Session-based recommender system for sustainable digital marketing. Sustainability, 11(12), 3336
Article Google Scholar
Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-Based Recommender System. Journal of Machine Learning Research, 6, 1265–1295.
Google Scholar
Tavakol, M., & Brefeld, U. (2014). Factored MDPs for detecting topics of user sessions, In:Proceeding the 8th ACM Conference on Recommender Systems(RecSys 2014, ACM: Foster City, Silicon Valley, California, USA. pp. 33–40
Tran, T., Phung, D., & Venkatesh, S. (2016). Collaborative filtering via sparse Markov random fields. Information Sciences, 369, 221–237
Article Google Scholar
Aryabarzan, N., Minaei-Bidgoli, B., & Teshnehlab, M. (2018). negFIN: An efficient algorithm for fast mining frequent itemsets. Expert Systems with Applications, 105, 129–143
Article Google Scholar
Soysal, Ö. M. (2015). Association rule mining with mostly associated sequential patterns. Expert Systems with Applications, 42(5), 2582–2592
Article Google Scholar
Devooght, R., & Bersini, H. (2017). Long and short-term recommendations with recurrent neural networks, In: The 25th conference on user modeling, adaptation and personalization (UMAP 2017). ACM: FIIT STU, Bratislava, Slovakia. pp. 13–21
Hidasi, B., et al. (2015). Session-based recommendations with recurrent neural networks ar**v preprint ar**v:1511.06939,
Fukuda, M., & Nakatani, Y. (2011). Clothes recommend themselves: A new approach to a fashion coordinate support system, In : Proceeding of the World Congress on Engineering and Computer Science 2011. pp. 19–21
Gu, S., et al. (2017). Fashion coordinates recommendation based on user behavior and visual clothing style, In: Proceeding of the 3rd International Conference on Communication and Information Processing. ACM: Tokyo, Japan. pp. 185–189
Lin, Y. R., et al. (2019). Clothing recommendation system based on visual information analytics. In 2019 International Automatic Control Conference (CACS). IEEE
Yu, W., et al. (2018). Aesthetic-based clothing recommendation, In: The World Wide Web Conference (WWW 2018). 2018: Lyon, France. p. 649–658
Lu, J., et al. (2015). Recommender system application developments: A survey. Decision Support Systems, 74, 12–32
Article Google Scholar
Shi, Y., Larson, M., & Hanjalic, A. (2014). Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Computing Surveys (CSUR), 47(1), 1–45
Article Google Scholar
Cunha, T., Soares, C., & de Carvalho, A. C. (2018). Metalearning and Recommender Systems: A literature review and empirical study on the algorithm selection problem for Collaborative Filtering. Information Sciences, 423, 128–144
Article Google Scholar
Cheng, W. H., et al. (2021). Fashion meets computer vision: A survey. ACM Computing Surveys (CSUR), 54(4), 1–41
Article Google Scholar
Iwata, T., Wanatabe, S., & Sawada, H. (2011). Fashion coordinates recommender system using photographs from fashion magazines, In the twenty-second international joint conference on artificial intelligence (IJCAI ‘11). : Barcelona, Catalonia, Spain. pp. 2262–2267
Jagadeesh, V., et al. (2014). Large scale visual recommendations from street fashion images, In the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ‘14). ACM: New York, New York, USA. pp. 1925–1934
Quan**, H. (2015). Analysis of Collaborative Filtering Algorithm fused with Fashion Attributes. International Journal of u-and e-Service Science and Technology, 8(10), 159–168
Article Google Scholar
Kolstad, A., et al. (2017). Rethinking conventional collaborative filtering for recommending daily fashion outfits. In: Proceedings of RecSysKTL workshop@ ACM RecSys.
Yin, R., et al. (2019). Enhancing fashion recommendation with visual compatibility relationship, In WWW ‘19: The World Wide Web Conference. : San Francisco CA USA. pp. 3434–3440
Campos, P. G., Díez, F., & Cantador, I. (2013). Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Modeling and User-Adapted Interaction, 24(1), 67–119
Google Scholar
Zhu, T., Sun, L., & Chen, G. (2021). Graph-based Embedding Smoothing for Sequential Recommendation.IEEE Transactions on Knowledge and Data Engineering, : 1–1
Wang, S., & Qiu, J. (2021). A deep neural network model for fashion collocation recommendation using side information in e-commerce. Applied Soft Computing, 110, 107753
Article Google Scholar
Da’u, A., & Salim, N. (2020). Recommendation system based on deep learning methods: a systematic review and new directions. Artificial Intelligence Review, 53(4), 2709–2748
Article Google Scholar
Fang, H., et al. (2020). Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations. ACM Transactions on Information Systems (TOIS), 39(1), 1–42
Article Google Scholar
Bernhardsson, E. (2014). Recurrent Neural Networks for Collaborative Filtering. 28 July 2014 [cited 2022 20 Jan]; Available from: https://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering.html
Devooght, R., & Bersini, H. (2016). Collaborative filtering with recurrent neural networks ar**v preprint ar**v:1608.07400,
Wu, C. Y., et al. (2017). Recurrent recommender networks, In: Proceedings of the tenth ACM international conference on web search and data mining. ACM: Cambridge, United Kingdom. p. 495–503
Shewalkar, A. (2019). Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. Journal of Artificial Intelligence and Soft Computing Research, 9(4), 235–245
Article Google Scholar
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306
Article Google Scholar
Wang, X., et al. (2019). OGRU: An optimized gated recurrent unit neural network. In Journal of Physics: Conference Series. IOP Publishing
Tan, Y. K., Xu, X., & Liu, Y. (2016). Improved recurrent neural networks for session-based recommendations, In The 1st workshop on deep learning for recommender systems. ACM: Boston MA USA. p. 17–22
Jannach, D., & Ludewig, M. (2017). when recurrent neural networks meet the neighborhood for session-based recommendation, In the eleventh ACM conference on recommender systems. ACM: Como, Italy. p. 306–310
Zhang, Y., et al. (2014). Sequential click prediction for sponsored search with recurrent neural networks, In the twenty-eighth AAAI Conference on artificial intelligence (AAAI-14). AAAI Press: Québec City, Québec, Canada. p. 1369–1375
Liu, K., Shi, X., & Natarajan, P. (2017). Sequential heterogeneous attribute embedding for item recommendation, In IEEE international conference on data mining workshops (ICDMW). 2017, IEEE. pP. 773–780
Smirnova, E., & Vasile, F. (2017). Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks, In: Proceedings of the 2nd workshop on deep learning for recommender systems. ACM: Como, Italy. pp. 2–9
Wu, C. Y., et al. (2017). Joint training of ratings and reviews with recurrent recommender networks, In international conference on learning representations. : Toulon, France
Hidasi, B., et al. (2016). parallel recurrent neural network architectures for feature-rich session-based recommendations, In the 10th ACM conference on recommender systems. ACM: Boston, Massachusetts, USA. pp. 241–248
Cho, K., et al. (2014). On the properties of neural machine translation: Encoder-decoder approaches ar**v preprint ar**v:1409.1259,
Davidson, J., et al. (2010). Theyoutube video recommendation system, In the fourth ACM conference on recommender systems. ACM: Barcelona Spain. pp. 293–296
Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. Internet Computing IEEE, 7(1), 76–80
Article Google Scholar
Rendle, S., et al. (2009). bpr: bayesian personalized ranking from implicit feedback, In UAI ‘09: proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press: Montreal Quebec Canada p. 452–461
Ding, Y., & Li, X. (2005). Time weight collaborative filtering, In Proceedings of the 14th ACM international conference on information and knowledge management. ACM: Bremen, Germany. pp. 485–492
de Moreira, S. P., Ferreira, G. F., & Cunha, A. M. (2018). ACM: Vancouver BC Canada pp. 15–23
Quadrana, M. (2017). Algorithm for Sequence-Aware Recommender Systems. Politecnico di Milano
Quadrana, M., Cremonesi, P., & Jannach, D. (2018). Sequence-Aware Recommender Systems. ACM Computing Surveys (CSUR), 5(4), 1–36
Article Google Scholar
Sottocornola, G., Symeonidis, P., & Zanker, M. (2018). Session-based news recommendations, In companion proceedings of the the web conference 2018, international world wide web conferences steering committee: Lyon, France. pp. 1395–1399
Kuhn, T., et al. (2019). Supporting stylists by recommending fashion style ar**v preprint ar**v:1908.09493,
Sato, A., et al. (2013). suGATALOG: Fashion coordination system that supports users to choose everyday fashion with clothed pictures, In the 15th international conference on human-computer interaction(HCI 2013, Springer: Las Vegas, NV, USA. pp. 112–121
Yonezawa, Y., & Nakatani, Y. (2009). Fashion support from clothes with characteristics. In symposium on human interface. Springer
Zhou, C., et al. (2017). Atrank: An attention-based user behavior modeling framework for recommendation ar**v preprint ar**v:1711.06632,
Zhu, Q., et al. (2019). Dan: Deep attention neural network for news recommendation. in The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19). Honolulu, Hawaii, USA

Download references

Acknowledgements

This research was supported by the Bisa Research Grant of Keimyung University in 2018, 2019, and 2020 (No. 20180731, No. 20190760, and No. 20200589).

Author information

Authors and Affiliations

Department of Management Information Systems, Keimyung University, Daegu, 42061, South Korea
Yang Sok Kim, Hee Jun Lee & Won Seok Lee
Hana Financial Group, Chief Data Officer, Seoul, 04523, South Korea
Hyunwoo Hwangbo

Authors

Yang Sok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyunwoo Hwangbo
View author publications
You can also search for this author in PubMed Google Scholar
Hee Jun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Won Seok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YSK: Conceptualization, Methodology, Software, Writing - Original Draft HH: Writing- Reviewing and Editing, HJL: Software, Writing - Original Draft,WSL: Software, Data curation.

Corresponding author

Correspondence to Hyunwoo Hwangbo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, Y.S., Hwangbo, H., Lee, H.J. et al. Sequence aware recommenders for fashion E-commerce. Electron Commer Res (2022). https://doi.org/10.1007/s10660-022-09627-8

Download citation

Accepted: 11 October 2022
Published: 11 November 2022
DOI: https://doi.org/10.1007/s10660-022-09627-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sequence aware recommenders for fashion E-commerce

Abstract

Similar content being viewed by others

Reusable Self-attention-Based Recommender System for Fashion

Deep Neural Network Incorporating CNN and MF for Item-Based Fashion Recommendation

Modeling Contextual Changes in User Behaviour in Fashion e-Commerce