Abstract
The term “serendipity” has been understood narrowly in the Recommender System. Applying a user-centered approach, user-friendly serendipitous recommender systems are expected to be developed based on a good understanding of serendipity. In this paper, we introduce CHESTNUT, a memory-based movie collaborative filtering system to improve serendipity performance. Relying on a proposed Information Theory-based algorithm and previous study, we demonstrate a method of successfully injecting insight, unexpectedness and usefulness, which are key metrics for a more comprehensive understanding of serendipity, into a practical serendipitous recommender system. With lightweight experiments, we have revealed a few runtime issues and further optimized the same. We have evaluated CHESTNUT in both practicability and effectiveness, and the results show that it is fast, scalable and improves serendipity performance significantly, compared with mainstream memory-based collaborative filtering. The source codes of CHESTNUT are online at https://github.com/unnc-ucc/CHESTNUT.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In an era of an increasing need for personalized recommendations, serendipity has become an important metric for achieving such a goal. Serendipitous recommender systems have been investigated and developed, to generate such results for their customers. Such systems can now be found in certain applications, such as in music recommendation [21].
However, as a user-centric concept, serendipity has been understood narrowly within the Recommender System field, and it has been defined in previous research as “receiving an unexpected and fortuitous item recommendation” [20]. The understanding of serendipity, as a user-centered concept, has been a gap for a while. Until recently, an awareness of this gap has led a conceptual bridge, which introduced serendipity from Information Research into Recommender Systems, by proposing an Information Theory-based algorithm [36]. To further investigate this algorithm, it needs to be implemented as an end-to-end recommender system, but it is difficult to do so.
The challenges of transferring this conceptual bridge into a real-world implementation are two-fold. Firstly, it is demanding to inject the understanding appropriately, since the implementation may forfeit the algorithm design, to develop such a run-time system. Secondly, even though the implementation can recommend serendipitous information, it is demanding to ensure an overall enhanced user experience. For example, the overall system performance may compromise a user’s experience, if the system response time is slow, since serendipity is a very sensitive feeling.
Thus, it is important, that serendipitous systems are designed with an accurate understanding of the concept, while delivering a high level of performance. Hence, we present CHESTNUT, a state-of-the-art memory-based movie recommender system to improve serendipity performance. Whereas prior research has produced many serendipitous frameworks, it has focused on applying algorithmic techniques, rather than transferring a basic understanding of serendipity into the system development (Sect. 3).
We have addressed the issues of develo** serendipitous systems by following a user-centered understanding of serendipity (Sect. 3) and focusing on runtime failures while making predictions (Sect. 5). Furthermore, we have optimized CHESTNUT by revisiting and updating significance weighing statistically to ensure a high level of system performance.
More specifically, we have made three main contributions here:
-
(1)
CHESTNUT Movie Recommender System. CHESTNUT applies an Information Theory-based algorithm, which aims to combine three key metrics based on a user-centered understanding of serendipity: insight, unexpectedness and value [36]. With regard to these metrics, CHESTNUT has three key functional units, respectively: 1) cInsight performs the “making connections” to expand a target user’s profile, to collect all target-user-related items (Sect. 3.1); 2) cUnexpectedness filtered out all expected items from all target-user-related items, with the help of a primitive prediction model (Sect. 3.2); and 3) cUsefulness evaluates the potential value of those candidate items through prediction, and generates a list of recommendations by sorting them from high to low (Sect. 3.3). In addition, while develo** CHESTNUT we revealed key implementation details (Sect. 4). The source codes of CHESTNUT are online at https://github.com/unnc-ucc/CHESTNUT.
-
(2)
Optimizations of CHESTNUT. Through system development, we observed that implementations following conventional methods could cause runtime failure in CHESTNUT. We have formulated this problem (Sect. 5.1), and optimized CHESTNUT in two ways: First, we adjust the conventional designs while generating predictions for memory-based collaborative filtering techniques (Sect. 5.2); Second, we revisited the conventional optimization method, significance weighting, to further improve the performance and effectiveness of CHESTNUT, with updates based on statistical analysis (Sect. 5.3).
-
(3)
Qualitative Evaluation of CHESTNUT. We conducted an experimental study to assess the performance of CHESTNUT, both a bare metal version and in optimized versions (Sect. 6). We have also benchmarked CHESTNUT with two mainstream memory-based collaborative filtering techniques, namely: item-based collaborative filtering and K-Neareset-Neighour user-based collaborative filtering from Apache Mahout. The results shows that CHESTNUT is fast, scalable and extremely serendipitous.
2 Background
CHESTNUT is built on a series of works, which aimed to understand serendipity, to quantify serendipity in many use cases and to introduce serendipity understanding into the Recommender System (i.e. would be illustrated in detail further). We have also draw inspiration from the implementation and optimization of memory-based collaborative filtering techniques to enhance the system performance [7,8,9, 27].
Within the Recommender System field, serendipity has been understood as “receiving an unexpected and fortuitous item recommendation” [20]. Many efforts have been made in the development and investigation of serendipitous recommender systems [1,2,3, 5, 6, 10,11,12, 14, 15, 23, 24, 28, 29, 31,32,33]. Until recently, the main focus of the development of serendipitous recommender systems has centered on the algorithmic techniques that are being deployed, however, there are no existing systems which aim to bring an optional serendipitous user experience by applying a user-centered approach to the development of serendipitous recommender systems.
Unlike accuracy or other metrics, serendipity, as a user-centric concept, is inappropriate for taking this narrow view within this field. Understanding the serendipity has already raised considerable interest and it has been investigated for long in multiple disciplines [18, 19, 25, 30]. For instance, to better understand this concept, a number of theoretical models have been established to study serendipity [16, 17, 26]. More recently, previous research has highlighted how “making connections” is an important point for serendipitous engineering [13]. Based on previous research outcome from Information Research, an Information Theory-based algorithm has been proposed to better understand serendipity in the Recommender System [36]. Furthermore, a systematic context-based study among Chinese Scholars has been conducted and proves the effectiveness of the proposed algorithm [35].
This proposed conceptual bridge, which is based on a more comprehensive understanding of serendipity by merging insight, unexpectedness and usefulness, has been partly developed and studied in a movie scenario with early tryouts [34]. To bring together the above aspects, the system is expected to work sequentially in three steps, as follows: it first expands the user’s profile by “making connections”; it then filters out unexpected items, according to the expanded profile and the original one; finally, it predicts ratings to calculate the value of all selected items to the target user, and then make appropriate recommendations.
However, it is still unclear how the proposed algorithm could be developed as an end-to-end recommender system in a real-world scenario, which is very practical, effective and suitable to deploy. Based on previous investigations, we have implemented CHESTNUT in a movie recommendation scenario. Below, we have presented a comprehensive overview of three major components to ensure and balance the three given metrics: insight, unexpectedness and usefulness (Sect. 3). In addition, we have presented the implementation details (Sect. 4) and optimization choices made during the development of CHESTNUT, which have been employed to attempt to improve its reliability and practicality in the real world (Sect. 5).
3 CHESTNUT Overview
Before explaining the details of the implementation, we introduce the three major functional units of CHESTNUT, which were developed consequentially with due consideration of the three metrics of serendipity mentioned above. There are three major functional units in CHESTNUT: cInsight, cUnexpectedness and cUsefulness. These units function sequentially and ensure corresponding metrics, one by one.
3.1 cInsight
The design of cInsight aims to stimulate the “making connections” process, which is a serendipitous design from Information Research, to expand the profile of target users.
Details of the functional process of making connections are as followed. With the users’ profiles uploadedFootnote 1, according to a referencing attributeFootnote 2, making connections would direct target users from their own information towards the most similar users in this selected attributeFootnote 3. This whole process is denoted as a level. The repetition of this process, by starting from the output in the previous level, would finally end with an active user or a set of active users, when the similarity between active user and target user reaches the threshold.
cInsight is not parameter-free: there are two parameters which need to be set in advance. First, the referencing attribute should be determined as the metric for making connections, and it should be related information, such as side information categoriesFootnote 4. Second, is the threshold to determine if the repetition shall end. Since more levels are formed by making connections, there is a larger distance between active users and target users. This threshold aims to make sure active users are not too “far” from the target user. Here, the thresholds could be the mathematical abstractions of similarityFootnote 5. cInsight performed the making connections process by starting with the target user profile. The repetitions of multiple levels would terminate and form a direction from target users to active users. cInsight would finally re-organize all active users’ profilesFootnote 6 for further processing. Here, assuming referencing attribute is director of movies, an example would be introduced as a brief explanation of making connections process:
For a Target user who will be recommended with serendipitous information, cInsight works by analyzing his or her profile, and selects corresponding information from the profile as the starting point, which will depend on which attribute has been selected to reference.
As Fig. 1 shows, the movie Director D1, who received the most movie ratings from User A, is selected as the attribute in this example. Then, according to D1, another User B, can be selected who is a super fan of D1 and who contributes the largest number of movie ratings for D1 throughout the whole movie database. If User A and User B satisfy the defined threshold on similarity, then User B is considered as the active user to recommend movies to User A. Otherwise, the algorithm continues to find another User C, by selecting another Director D2, on the basis of User B’s profile, until User Z is found to meet the threshold between User Z and the Target user A.
3.2 cUnexpectedness
After cInsight, all relevant items, generated by making connections, have been passed forward to cUnexpectedness. The design of cUnexpectedness aims to make sure all remaining items are indeed unexpected by the target user.
The functional process of cUnexpectedness proceeds in two steps, respectively. Firstly, it aims to identify what items a target user expects, based on a broader view of results from cInsight. Here, applying the primitive prediction model, cUnexpectedness expands the original target users’ profiles into a target-users-would-expect profile. Secondly, based on the expected items generated by the first step, cUnexpectedness would remove all intersections between them and all items passed from cInsightFootnote 7).
Here, we illustrate how the first step could be abstracted. The expected movie list (EXP) consists of two parts, namely those movies that could be expected by the users (Eu), and a primitive prediction model (PM) (e.g. those movies have been rated very high on average). And this are desribed in Eq. (1).
Through cUnexpectedness, items from cInsight have been confirmed as being unexpected by the target user, which satisfies the need of unexpectedness.
3.3 cUsefulness
Following the guarantees of cInsight and cUnexpectedness, the final unit is to identify which items are valuable to target users, so cUsefulenss has been developed to achieve this goal. To evaluate potential movies’ value towards target user(s), generating prediction scores is the methodology applied in CHESTNUT, conducted by cUsefulness. cUsefulness quantifies the value of each unexpected movie to target users by predicting how they would be rated by target users.
Since the development plan is collaborative-filtering based, the following equation, which is a conventional approach for prediction, is used to calculate the movie prediction score in cUsefulenss.
In Eq. (2), \(\bar{r_{a}}\) and \(\bar{r_{u}}\) are the average ratings for the user a and user u on all other rated items, and \(W_{a,u}\) is the weight calculated by the similarity values between the user a and user u. The summations are over all the users \(u \in U\) who have rated the item i.
4 Implementation Details
After giving an overview of CHESTNUT’s architecture and exploring the functionalities of the major components, in this section we will introduce some implementation details while develo** CHESTNUT, which enhanced the performance and practicality. CHESTNUT was developed in approximately 6,000 lines of codes in Java.
4.1 Similarity Metrics
As for the similarity metrics, during the development of CHESTNUT, Pearson Correlation Coefficient was selected as the similarity metric, which is described in Eq. (3).
In Eq. (3), the \(i \in I\) summations are over the items that both users u and v have rated, \(r_{u,i}\) is the rating of u-th user on the i-th item and \(\bar{r_{u}}\) is the average rating of the co-rated items of the u-th user.
4.2 cInsight
cInsight expanded its profile through the connection-making process, after collecting the user’s profile, which relies on the referencing attribute from this target user. According to the number of movies rated by the user with respect to this very attribute and users’ effective ratings, the most related onesFootnote 8 has been selected. With this selection, another user’s profile could be generated which covers all the users that have rated movies, with this referencing attribute. Through sorting by the number of effective scores on this director from different users, the largest was chosen as the next user. This process would be repeated until the similarity between target user and selected user reached a threshold, which had been set in advance.
In CHESTNUT, the referencing attribute has been set as director of movies, and the effective scores refer to those ratings above 4.0Footnote 9. Moreover, this threshold has been set at 0.3Footnote 10. These settings are based on “cInsight”-related studies previously [34].
4.3 cUnexpectedness
cUnexpectedness preserves the unexpected items by excluding those any expected items from all active users’ items. Generating such expected items relies on the primitive prediction model.
In CHESTNUT, through the primitive prediction model, cUnexpectedness expanded the target user’s profile in two respects: first, it added all series movies, if any of those had appeared within the target user’s profile. Second, it also added the Top Popular Movies.
As Fig. 2 demonstrates, the work flow for generating the target-user-expected movies. While we implemented, we have specifically done in the following ways: for the first step, cUnexpectedness determines whether a movie belongs to a film series, by comparing their titles. To speed up this process, here we applied a dynamic programming approach. In the second step, we selected Top Two Hundred because we observed that there is an obvious fracture in this very number, through sorting counts from high to low, based on the number of ratings have been given in the whole data set.
4.4 cUsefulenss
cUsefulness is responsible for examining the potential value of all movies, which have been filtered by cUnexpectedness. In the very first prototype development, cUsefulness functioned as the same as other memory-based collaborative filtering techniques, by exploring target users’ neighbors, finding one with the most similarities and generating predictions according to the method mentioned in Sect. 3.3. However, through lightweight tests, we observed how this have caused run-time failures. We will discuss about it in Sect. 5.
4.5 User Interface
For user interactions, a website has been developed as a user interface for CHESTNUT. After logging in, the user is able to view their rated movies, as shown in Fig. 3. For each movie, the interface would offer an image of the movie poster, the title, the published year, the director and the rating from this user.
The follow-up pages, which enable users to view results and give feedback, are organized very similarly. However, when viewing the results, users are able to gather more information via their IMDB links (e.g. for more details or trailers), to present their own ratings, to answer the designed questionnaire and to leave comments.
5 Optimization
In this section, we introduce some key insights for the related optimization of CHESTNUT. Through lightweight tests, we found out that CHESTNUT could only produce one to two results for almost every user. To improve the system’s overall performance and deployability, we optimized CHESTNUT by applying a new significance weighting and reforming the prediction mechanism. We first explored the problem, and then introduced them respectively.
5.1 Problem Formulation
After breakdown evaluations of each component in CHESTNUT, we found that for every target user in the test set, only two to three items were predicted via cUsefulness, when the recommendation list was set to 1,000.
We believe this problem is two-fold. First, memory-based collaborative filtering relies on users’ existing profiles to assist the prediction, and this method was directly conducted by searching co-rated items within the users’ neighbors. However, with CHESTNUT, neighbor users are very unlikely to have co-rated items. From our observations, almost every user could not be supported by their top two hundred neighbors in CHESTNUT.
The second issue is more interesting. Owing to the characteristics of Pearson-Correlation Coefficient, the smaller the intersection between two users, the more the possibility that the value is higher. In other words, some similarities are not trustworthy and these led indirectly to CHESTNUT’s runtime failures.
5.2 Mechanism Adjustment
Rather than searching a target user’s neighbors from high similarity to low, cUsefulness applied a greedy approach to ensure the prediction process could proceed. Each time cUsefulness needs to make a prediction, it first selects all users who have co-rated need-to-predict items. Then, within this group, cUsefulness cross-checks to find if there are any neighbors. If so, cUsefulness regroups and ranks from high to low, according to the similarity. With these settings, cUsefulness would proceed and make predictions for as many items as possible.
This mechanism adjustment demonstrated its benefits. First, it optimized the overall system performance. Since prediction is the most time-consuming element of CHESTNUT, this adjustment ensured that the prediction would not reach a dead end, when finding predictable neighbors. Second, since it guaranteed the co-rated item in advance, it ensured that CHESTNUT would not have any runtime failures, caused by prediction interruptions.
However, this mechanism has intensified the formulated problem which mentioned previously. Since the computing sample size was smaller, owing to the features of serendipitous recommendation, the reliability of the similarity values would inevitably affect the overall recommendation quality.
5.3 Similarity Correction
We are not the first to recognize the necessity of similarity correction. Previous research has identified this kind of issue and has offered a solution known as significance weighting [8]. By setting a threshold, all similarity values, with fewer counts of co-rated items than this threshold, would divide a certain value to correct the value and maintain the exact similarity value.
In previous trials, 50 has been selected as the number for significance weighting to optimize the prediction process. However, in existing literature there is no explanation for how such a number has been obtained, and it appears to be a threshold obtained from previous experience. Since this threshold could be quite sensitive for the data set, we decided to explore and analyze its usage from a statistical perspective. As previously explained, the characteristics of Pearson-Correlation Coefficient could be too extreme when co-rated items are very limited (e.g. only one or two). Therefore, we have assumed the distribution shall be a normal distribution and we take advantage of the Confidence Ratio to illustrate this very problem.
All Pearson-Correlation values are computed and collected. All the values are then clustered and plotted on a new graph, with the average co-rated movie counters as y-axis and these values as x-axis. As shown in Fig. 4, it is evident that this nonlinear curve can be fitted into a GaussAMP model, which illustrates that the global Pearson-Correlation Coefficients approximate a normal distribution.
Inspired by the Confidence Ratio in a Normal Distribution, we defined the quantity of edge areas as the unlikelihood. This unlikelihood aimed to quantify the unreliability of similarity values from global views. Based on the results presented in Fig. 5, the Reliability, or the Confidence Ratio, could be abstracted as calculus mathematically. We then further selected four confidence ratios, in comparison with the initial value of 50. According to the different ratios of the complete areas, determine the height reversely and apply into M and calculate the corresponding n, Table 1 could be obtained:
We substituted the obtained results with the significance weighting respectively, and applied this similarity correction to improve the reliability of these values, in all related components of CHESTNUT.
6 Experimental Study
In this section, we introduce details of CHESTNUT’s experimental study. The HeteRec 2011 data set was selected as the source data for this experimental evaluation. It contains 855,598 ratings on 10,197 movies from 2,113 users [4]. In addition, all users’ k-nearest neighbors’ data are also prepared in advance.
The experiment began by initializing the database and makes the supplement for information about directors of all the movies via a web crawler. Bearing in mind that some movies have more than one director, and there are no rules of distinction which are recognized by the public, only the first director was chosen during this process. After completion of the data preparation, CHESTNUT with different correction levels was run through each user in the database in turn.
Since CHESTNUT is a memory-based collaborative filtering system, to examine overall performances, we chose mainstream memory-based collaborative filtering techniques, namely: item-based and user-based collaborative filtering from Mahout as the benchmark [22].
All the implementations were conducted in Java and all the experiments were run on a Windows 10 Pro for Workstations based workstation Dell Precision 3620 with Inter Xeon E3-1225 processor (Quad Core 3.3 GHz, 3.7 GHz Turbo, 8 MB, w/HD Graphics P530) and 32 GB of RAM (\(4\times 8\) GB, 2400 MHz, DDR4).
Our experimental study aimed to answer the following three questions:
-
(1)
How much performance improvement can be achieved with CHESTNUT, compared with mainstream memory-based collaborative filtering techniques?
-
(2)
How many performance benefits have been gained with CHESTNUT, when different optimization levels are deployed?
-
(3)
What tradeoffs are caused if CHESTNUT is optimized with significance weighting?
6.1 Recommendation Performance
We first demonstrated that CHESTNUT can significantly improve the unexpectedness of recommendation results and while maintaining its scalability. For this purpose, we varied the number of items in the recommendation lists from 5 to 1000, and each time increased the number by 5. As shown in Fig. 5, CHESTNUT could perform unexpectedness between 0.9 and 1.0. However, item-based and user-based collaborative filtering could only perform unexpectedness within the ranges 0.75 to 0.8 and 0.43 to 0.6 respectively. This is because unexpectedness was one of the major goals set during the design and development of CHESTNUT (Fig. 7).
Figure 6 shows that CHESTNUT could continue its dominant performance in serendipity, which follows the same experiment settings. As benchmark systems, item-based and user-based systems perform serendipity within the ranges of 0.05 to 0.08 and 0.3 to 0.4, respectively. Nevertheless, CHESTNUT still outperformed these conventional systems in serendipity performance. There are two interesting observations within this series of experiments. One is that, although the item-based approach could produce more unexpected results than the user-based, the user-based approach provided more serendipitous recommendations.
The other interesting fact is that serendipity performance degraded gradually, when applying CHESTNUT without optimization. However, optimized versions of CHESTNUT performed better scalability. More details of this observation, will be discussed in Sect. 6.2.
As for time consumption, more details are provided in Fig. 9. It is necessary to highlight that, in the item-based case, approximately 10,000 ms were required, on average. However, the user-based approach did achieve very good performance, by consuming 17.24 ms on average. As for CHESTNUT, although it is slightly slower than the user-based approach, it is still much faster than item-based implementation. All versions of CHESTNUT could finish the service between 59.85 and 74.34 ms on average, which supports the assertion that CHESTNUT’s performance is very competitive.
Finally, yet importantly, we have explored the accuracy of the recommendation results among the three systems. As their design goals, item-based and user-based approaches achieved 0.4804 and 0.4439 in MAE, which implies that they produce quite accurate results. However, for CHESTNUT, the results, irrespective of whether they are with or without the optimization, are less accurate than the benchmark systems.
6.2 Performance Breakdown
Based on Sect. 6.1, we observed the necessity to explore a performance breakdown analysis. We first examined the unexpectedness evaluations in detail. Different from previous settings, we took a closer view of unexpectedness performance, by narrowing the recommendation list size from 5 to 1,000 to 5 to 200. The most interesting observation is that, unexpected results were irrelevant to the optimization levels of CHESTNUT. As Fig. 9 shows, although there are variations in this metric, unexpectedness still remains over 0.992. However, we have found that significance weighting did not affect the unexpectedness performance at all, which indicates that the levels of optimization did not affect the performance of cInsight. This is because the threshold in cInsight served as the lower boundFootnote 11, and our optimization mainly aims to correct any extremely high similarities, which are caused by too small an intersection size between users.
However, optimizations do play a role in cUsefulness. To examine this in more detail, we maintained a very narrow view by setting the recommendation list size from 5 to 50. It has been observed that when a recommendation list size is smaller than 15, all optimized versions produce more serendipitous results than in the original version, although they were already very serendipitous. When the size is between 15 to 50, the situation was reversed. However, if we combined Fig. 10 with Fig. 6, the overall scalability of CHESTNUT is much weaker than the optimized versions.
This performance variation could be explained from two aspects. Since CHESTNUT could only make predictions within a small group compared to the other systems, and when there was no optimization, the predictions could be virtually high and this led to an obvious drift, as illustrated in Fig. 6 (the blue line). We believe that the most important benefit of optimization is that, it stabilizes serendipity performance and improves the scalability of the whole system, by improving the reliability of the similarity values.
6.3 The Tradeoff Caused by Optimization
Here, we have mainly focused on the tradeoff caused by Similarity Correction, since the other optimization aims to make CHESTNUT runnable. There are two main tradeoffs to discuss about.
First, there are some runtime overheads when values are corrected. As Fig. 9 shows, all optimized versions have a slight increase in the service time. As for the variations within these optimized versions, this is because if the correction rate were too high or too low, it would increase the computation difficultly and then cause overheads.
Second, we observed a very interesting situation. In the early investigations of significance weighting, researchers claimed that this approach was able to improve the accuracy of recommendations, and further investigation has supported that this very setting is effective [7,8,9]. However, optimized versions of CHESTNUT has conflicted with this. Figure 8 reveals a slight trend of accuracy loss, when the optimized levels were increased. We believe this is because of CHESTNUT’s characteristics. What has been improved, via this optimization, is the trustworthiness of the similarity values. Unlike accuracy-oriented systems, it cannot be equal to the accuracy in serendipitous systems.
7 Discussion
Our experimental study revealed two main points for further discussions. First, CHESTNUT has been proven that it is applicable to deploy the Information Theory-based algorithm, as an end-to-end recommender system which can induce serendipitous recommendations. Especially, while the recommendation size is less than 50, CHESTNUT has dominated the serendipity performance, with close to the upper bound in evaluations. Second, during the system implementation, it has been observed that CHESTNUT still needs optimizations via value corrections, to improve overall recommendation quality. Through revisiting and updating significance weighting concepts, CHESTNUT has been optimized to improve the overall scalability and serendipitous recommendation performance, because of the reliability of similarity values has been improved greatly.
8 Conclusion and Future Work
In this paper, we have presented CHESTNUT, a state-of-the-art memory-based collaborative filtering system that aims to improve serendipitous recommendation performance in the context of movie recommendation. We implemented CHESTNUT as three main functional blocks, corresponding to the three main metrics of serendipity: insight, unexpectedness and usefulness. We optimized CHESTNUT by revisiting and updating a conventional method “significance weighting”, which has significantly enhanced the overall performance of CHESTNUT. The experimental study demonstrated that, compared with mainstream memory-based collaborative filtering systems, CHESTNUT is a fast and scalable system which can provide extremely serendipitous recommendations. To the best of our knowledge, CHESTNUT is the first collaborative system, rooted with a serendipitous algorithm, which was built on the user-centred understanding from Information Researchers. Source codes of CHESTNUT is online at https://github.com/unnc-ucc/CHESTNUT.
The future work of CHSETNUT will focus on its extendibility. On the one hand, though CHESTNUT is not parameter-free, it wouldn’t be difficult to extend into different usage context (e.g. shop**, mailing and etc.) since parameters of CHESTNUT could be obtained through our previous implementation experiences. On the other hand, as mentioned in Sect. 4, the levels of connection-making still rely on our previous experience and function as thresholds, which is the major limitation for system extension. We would further study CHESTNUT’s effectiveness and its extendibility through a series of large-scale user studies and experiments.
Notes
- 1.
Those users denoted as target users.
- 2.
Attribute(s) to guide making connections.
- 3.
Those users denoted as active users.
- 4.
In movie recommendations, for instance, it could be directors, genres and so on.
- 5.
For example, Pearson Correlation Similarity, and so on.
- 6.
More specifically, their items.
- 7.
Those items from active users, generated by the target user.
- 8.
Information with regard to the referencing attribute.
- 9.
In this rating scale, the full mark is 5.0.
- 10.
Here, the similarity refers to Pearson-Correlation Similarity.
- 11.
When the value is less than it, making connections terminates.
References
Abbassi, Z., Amer-Yahia, S., Lakshmanan, L.V.S., Vassilvitskii, S., Cong, Y.: Getting recommender systems to think outside the box. In: Proceedings of the 2009 ACM Conference on Recommender Systems, RecSys 2009, New York, NY, USA, 23–25 October 2009, pp. 285–288 (2009)
Adamopoulos, P., Tuzhilin, A.: On unexpectedness in recommender systems or how to better expect the unexpected. ACM TIST 5(4), 1–32 (2014)
Bhandari, U., Sugiyama, K., Datta, A., **dal, R.: Serendipitous recommendation for mobile apps using item-item similarity graph. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 440–451. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45068-6_38
Cantador, I., Brusilovsky, P., Kuflik, T.: Second workshop on information heterogeneity and fusion in recommender systems (HetRec2011). In: Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, USA, 23–27 October 2011, pp. 387–388 (2011)
de Gemmis, M., Lops, P., Semeraro, G., Musto, C.: An investigation on the serendipity problem in recommender systems. Inf. Process Manage 51(5), 695–717 (2015)
Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, 26–30 September 2010, pp. 257–260 (2010)
Ghazanfar, M.A., Prügel-Bennett, A.: Novel significance weighting schemes for collaborative filtering: generating improved recommendations in sparse environments. In: Proceedings of the 2010 International Conference on Data Mining, DMIN 2010, Las Vegas, Nevada, USA, 12–15 July 2010, pp. 334–342 (2010)
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl J.: An algorithmic framework for performing collaborative filtering. In: SIGIR 1999: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999, pp. 230–237 (1999)
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. SIGIR Forum 51(2), 227–234 (2017)
Ito, H., Yoshikawa, T., Furuhashi, T.: A study on improvement of serendipity in item-based collaborative filtering using association rule. In: IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2014, Bei**g, China, 6–11 July 2014, pp. 977–981 (2014)
Kamahara, J., Asakawa, T., Shimojo, S., Miyahara, H.: A community-based recommendation system to reveal unexpected interests. In: 11th International Conference on Multi Media Modeling, (MMM 2005), Melbourne, Australia, 12–14 January 2005, pp. 433–438 (2005)
Kawamae, N.: Serendipitous recommendations via innovators. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, pp. 218–225 (2010)
Kefalidou, G., Sharples, S.: Encouraging serendipity in research: designing technologies to support connection-making. Int. J. Hum. Comput. Stud. 89, 1–23 (2016)
Lee, K., Lee, K.: Using experts among users for novel movie recommendations. JCSE 7(1), 21–29 (2013)
Lee, K., Lee, K.: Esca** your comfort zone: a graph-based recommender system for finding novel recommendations among relevant items. Expert Syst. Appl. 42(10), 4851–4858 (2015)
Luo, J., Rongjun, Y.: Follow the heart or the head? The interactive influence model of emotion and cognition. Front. Psychol. 6, 573 (2015)
Makri, S., Blandford, A.: Coming across information serendipitously - Part 1: a process model. J. Documentation 68(5), 684–705 (2012)
Makri, S., Blandford, A., Woods, M., Sharples, S., Maxwell, D.: “Making my own luck”: serendipity strategies and how to support them in digital information environments. JASIST 65(11), 2179–2194 (2014)
McCay-Peet, L., Toms, E.G.: Investigating serendipity: how it unfolds and what may influence it. JASIST 66(7), 1463–1476 (2015)
McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Extended Abstracts Proceedings of the 2006 Conference on Human Factors in Computing Systems, CHI 2006, Montréal, Québec, Canada, 22–27 April 2006, pp. 1097–1101 (2006)
Murakami, T., Mori, K., Orihara, R.: Metrics for evaluating the serendipity of recommendation lists. In: Satoh, K., Inokuchi, A., Nagao, K., Kawamura, T. (eds.) JSAI 2007. LNCS (LNAI), vol. 4914, pp. 40–46. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78197-4_5
Musselman, A.: Apache mahout. In Encyclopedia of Big Data Technologies (2019)
Oku, K., Hattori, F.: Fusion-based recommender system for improving serendipity. In: Proceedings of the Workshop on Novelty and Diversity in Recommender Systems, DiveRS 2011, at the 5th ACM International Conference on Recommender Systems, RecSys 2011, Chicago, Illinois, USA, 23 October 2011, pp. 19–26 (2011)
Onuma, K., Tong, H., Faloutsos, C.: TANGENT: a novel, ‘surprise me’, recommendation algorithm. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009, pp. 657–666 (2009)
Pontis, S., et al.: Academics’ responses to encountered information: context matters. JASIST 67(8), 1883–1903 (2016)
Rubin, V.L., Burkell, J.A., Quan-Haase, A.: Facets of serendipity in everyday chance encounters: a grounded theory approach to blog analysis. Inf. Res. 16(3) (2011)
Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, 1–5 May 2001, pp. 285–295 (2001)
Schedl, M., Hauger, D., Schnitzer, D.: A model for serendipitous music retrieval. In: Proceedings of the 2nd Workshop on Context-awareness in Retrieval and Recommendation, CaRR 2012, Lisbon, Portugal, 14–17 February 2012, pp. 10–13 (2012)
Semeraro, G., Lops, P., de Gemmis, M., Musto, C., Narducci, F.: A folksonomy-based recommender system for personalized access to digital artworks. JOCCH 5(3), 1–22 (2012)
Sun, T., Zhang, M., Mei, Q.: Unexpected relevance: an empirical study of serendipity in retweets. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM 2013, Cambridge, Massachusetts, USA, 8–11 July 2013, (2013)
Taramigkou, M., Bothos, E., Christidis, K., Apostolou, D., Mentzas, G.: Escape the bubble: guided exploration of music preferences for serendipity and novelty. In: Seventh ACM Conference on Recommender Systems, RecSys 2013, Hong Kong, China, 12–16 October 2013, pp. 335–338 (2013)
Yamaba, H., Tanoue, M., Takatsuka, K., Okazaki, N., Tomita, S.: On a serendipity-oriented recommender system based on folksonomy and its evaluation. In: 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems, KES 2013, Kitakyushu, Japan, 9–11 September 2013, pp. 276–284 (2013)
Zhang, Y.C., Séaghdha, D.Ó., Quercia, D., Jambor, T.: Auralist: introducing serendipity into music recommendation. In: Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, 8–12 February 2012, pp. 13–22 (2012)
Zhou, X.: Understanding serendipity and its application in the context of information science and technology. Ph.D. thesis, University of Nottingham, UK (2018)
**aosong Zhou, X., Sun, Q.W., Sharples, S.: A context-based study of serendipity in information research among Chinese scholars. J. Documentation 74(3), 526–551 (2018)
Zhou, X., Xu, Z., Sun, X., Wang, Q.: A new information theory-based serendipitous algorithm design. In: Yamamoto, S. (ed.) HIMI 2017, Part II. LNCS, vol. 10274, pp. 314–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58524-6_26
Acknowledgement
We thank for valuable feedback and suggestions from our group members and anonymous reviewers, which have substantially improved the overall quality of this paper. This research is generously supported by National Natural Science Foundation of China Grant No. 71301085 and Hefeng Creative Industrial Park in Ningbo, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, X., Zhang, H., Zhou, X., Wang, S., Sun, X., Wang, Q. (2020). CHESTNUT: Improve Serendipity in Movie Recommendation by an Information Theory-Based Collaborative Filtering Approach. In: Yamamoto, S., Mori, H. (eds) Human Interface and the Management of Information. Interacting with Information. HCII 2020. Lecture Notes in Computer Science(), vol 12185. Springer, Cham. https://doi.org/10.1007/978-3-030-50017-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-50017-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50016-0
Online ISBN: 978-3-030-50017-7
eBook Packages: Computer ScienceComputer Science (R0)