Rating and perceived helpfulness in a bipartite network of online product reviews

Campos, Pedro; Pinto, Eva; Torres, Ana

doi:10.1007/s10660-023-09725-1

Rating and perceived helpfulness in a bipartite network of online product reviews

Open access
Published: 02 August 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Electronic Commerce Research Aims and scope Submit manuscript

Rating and perceived helpfulness in a bipartite network of online product reviews

Download PDF

1752 Accesses
3 Citations
Explore all metrics

Abstract

In many e-commerce platforms user communities share product information in the form of reviews and ratings to help other consumers to make their choices. This study develops a new theoretical framework generating a bipartite network of products sold by Amazon.com in the category “musical instruments”, by linking products through the reviews. We analyze product rating and perceived helpfulness of online customer reviews and the relationship between the centrality of reviews, product rating and the helpfulness of reviews using Clustering, regression trees, and random forests algorithms to, respectively, classify and find patterns in 2214 reviews. Results demonstrate: (1) that a high number of reviews do not imply a high product rating; (2) when reviews are helpful for consumer decision-making we observe an increase on the number of reviews; (3) a clear positive relationship between product rating and helpfulness of the reviews; and (4) a weak relationship between the centrality measures (betweenness and eigenvector) giving the importance of the product in the network, and the quality measures (product rating and helpfulness of reviews) regarding musical instruments. These results suggest that products may be central to the network, although with low ratings and with reviews providing little helpfulness to consumers. The findings in this study provide several important contributions for e-commerce businesses’ improvement of the review service management to support customers’ experiences and online customers’ decision-making.

Unveiling music genre structure through common-interest communities

Article 14 February 2022

Differences in Online Review Content between Old and New Products: An Abstract

Recommender systems based on user reviews: the state of the art

Article 22 January 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

E-commerce has been booming in the last years. The pandemic drove an enormous uptick in e-commerce’s share of total retail spending around the world. The Global Commerce Forecast estimated that by 2025, digital shoppers will spend $7.391 trillion online. To put that in context, 10 years ago, total worldwide retail spending amounted to just a little over $16 trillion [32]. The pandemic has brought new e-commerce opportunities to consumer goods. In e-marketplaces customers are increasingly finding and sharing useful information about products, such as photos, recommendations, reviews, and opinions to help others make buying decisions.

Sharing information about products and services is one of the greatest potentials of e-commerce, seen as a data driven type of business. Information systems provide tools that make suggestions for customized products or services, such as books, music, transportation or even people, based on information about products and users. Besides, online reviews are playing an increasingly important role in consumers’ online shopping decisions [25]. Reviews and recommendations are used in companies that deal with large amounts of information, like Google News, Twitter, LinkedIn, Netflix [75], Amazon.com and Alibaba [52], providing results of customized recommendations for products of interest based on historical customer data. When used effectively, this information provides suggestions to users based on matched preferences of other users, on the customer profile built from their previous purchases, or on the search history collected from websites.

Research indicates that these systems increase sales and consumer satisfaction [35]. Thus, a small improvement in this type of systems can leverage larger revenues [19, 52] and minimize sale risks [54]. However, academics and companies keep researching ways to increase their effectiveness to provide users with better recommendations [41, 45, 51].

We can delineate two main ways for communicating suggestions in e-marketplaces [18]: recommendation systems and reviews (under the form of scores, ratings, etc.). Recommendation systems are important tools in e-commerce as they provide users with machine generated personalized recommendations and allow them to discover new products for the same purpose through simplified search [14, 51]. Although recommendation systems usually achieve good results, the core recommendation strategies must also be adapted to deal with unique user tastes, providing a personalized recommendation and to deal with trusted peer influence. Moreover, Filieri [34] states that consumer involvement and experience, as well as the type of website, affects the way consumers assess trustworthiness in online reviews.

Reviews are comments assigned to products posted by users in e-marketplaces that have become increasingly important in the consumer’s decision-making process. Online product reviews websites, help consumers make informed decisions about purchasing new products and has become a major driving force in new product sales, making effective e-marketing a critical success factor for new product launch [24]. For instance, e-marketplaces, such as Amazon.com allows users to comment on the products available on the platform by providing feedback to other users about product attributes, quality, or performance [62]. Noticeably, about 93% of U.S. adults read reviews before making online purchases, suggesting that the vast majority of consumers understand the benefits of reliable reviews and would be motivated to write accurate ones themselves [86].

The awareness of the reviews may influence the behavior of the users because they assume that the criticisms exposed in the respective products/services are written by the consumers themselves [76]. For this reason, users rely on messages of other users and prefer them to those created commercially [16]. There are a number of reasons why people read reviews, from getting information, to building relationships in an online community or, affecting other readers’ behavior [15, 46].

For e-retail managers, consumer reviews are a valuable asset [36], demonstrated by the positive relationship between positive consumer reviews and willingness to pay [4], as well as increased sales [21, 39], and how they proactively used and organized these “free assistants” for marketing initiatives [40], reducing sales costs on customer assistance and support.

Literature on rating and the determinants of reviews and utilities have made gradual progress in designing and validating algorithms for calculating review utility scores and product ratings [28]. Given the great impact of product reviews on consumer purchases [64] companies can manipulate reviews to increase sales, posting favorable reviews and/or eliminating negative reviews [17]. Due to these constraints, it is relevant to study the helpfulness of the reviews.

In this research we develop a new theoretical framework to analyze and explain product rating and perceived helpfulness—the terms helpfulness and usefulness being used interchangeably-, of the online customer reviews, from a social network analysis perspective. We consider that the analysis of complex networks is an appropriate technique to be used in this context, as it allows for capturing the true amount of reviews, not exactly by their absolute number, but by measuring the importance (assessed by the centrality measures) of the products reviewed. Centrality measures are used to assess the importance of a product in the network, since they help us to identify nodes (products) that are important in a network, either because they have many connections or because they are located in strategic positions. These nodes might be key products in the network or have a disproportionate influence on the network’s behavior. In addition, by identifying important products because they are central, we can make predictions about how a network might behave under different conditions. The more central a product is in the review network, the more important it will be. The network is originally a bipartite network [6], containing two types of nodes: reviewers and products. By projecting the bipartite network in a one-mode network of products it is possible to measure the products’ centrality. Network science has been used to deal with reviews and ratings, as we can see in related work in Sect. 2.2. However, very little is known about the interplay between product rating, number of reviews and the helpfulness of the reviews in a network of products [83]. Our research aims to fill this gap by analyzing the importance of the products through the analysis of network centrality of reviews. So, one of the most innovative aspects of this work is the focus on the relationship between centrality and rating/helpfulness. Our motivation is that centrality may reflect an increase in rating and helpfulness of product reviews.

We focus on centrality measures as a way of capturing the importance of the products being reviewed and relate centrality to the helpfulness and the rating of the products reviewed. For that purpose, we use a data set openly available from Amazon.com [42], and generate a network of products of the category “musical instruments”, by linking products with posted reviews. The network is based on the principle that two products are linked if they are reviewed by the same reviewer. More than 2214 reviews that originated 5562 relationships between 717 different products have been analyzed using Machine Learning algorithms, such as Clustering and Regression Trees.

The paper is structured as follows: In Section 2, we discuss related work concerning reviews, ratings, and helpfulness and we describe the research questions and hypotheses. Section 3 presents the methodology, data set and introduces the first concepts of network science and the mathematical formulations of the main metrics of network centrality and modularity. Section 4 presents the analysis of the results and Sect. 5 contains a final discussion of the main findings with conclusions. The limitations of the study and future research challenges are also presented.

2 Theoretical framework

2.1 Customer reviews, helpfulness and rating

Most e-platform interactions involve a variety of—often-heterogeneous—entities, such as customers, vendors, and public/private institutions that generally have no relationship history. These relationships are built in the form of feedback—reviews, helpfulness and ratings.

2.1.1 Reviews

Online customer reviews provide new potential customers with relevant information about a product or service [3]. Online consumer reviews are popular sources of information about products and services: 72% of consumers aged 25–34 seek information for recommendations and opinions before buying goods and services (Mintel, 2015). Reviews can be defined as any comment on a product or service written by a consumer [34]. It has been empirically shown that the type of online consumer reviews assigned to a product significantly impacts its future sales.

According to [1, 2] customer reviews help customers to learn more about the product and decide whether it is the right product for them. Consumers rely more and more on reviews to assess product quality when making purchasing decisions, and the criticisms set forth are an unbiased reflection on product quality. The literature indicates that the quality, reliability, and helpfulness of reviews are critical factors on the impact of sales volume and that the negative effect of reviews is greater than the positive effect [24].

Thus, a large number of researchers agree that reviews influence decision-making processes and affect individuals’ behavior [3, 31, 48, 58]. However, several authors [37, 67] found that users give more importance to the criticisms written by real clients than to statistical summaries [16]. This finding highlights the importance of truthful and unbiased peer-to-peer information when consumers rely on reviews to make wise buying decisions.

Reviews also have an impact on advertising. Hollenbeck et al. [47] studied the relationship between online reviews and advertising spending in the hotel industry. They have combined a data set of TripAdvisor reviews with other data sets describing these hotels’ advertising expenditures, and show that online ratings have a causal demand-side effect on ad spending. Some researchers also stress that fake reviews can result in unfair competition, where a product’s ranking is artificially inflated or deflated [43], and the usefulness of online reviews is impeded by false reviews that give an untruthful picture of product quality [74].

Therefore, helpful reviews could be a signal of truthful reviews as sincere consumers write reviews to share their experiences, either positive or negative, that helped other consumers in their buying decision-making [74]. This suggests that helpfulness of reviews measure is of utmost importance in online marketplaces.

More often, decision-making is carried out within a social networking framework, in which individuals rely on the opinions and support of their closest friends or people with similar interests. For this purpose, the reviews are published in electronic portals that are intended to collect opinions to aid decision making [65].

However, the proliferation of reviews and the wealth of information available generates a great information overload [60], making it difficult for consumers to orient themselves and determine the most useful information for them. As useful reviews can increase sales [38], several e-commerce organizations allow consumers to vote on the helpfulness of each review, signaling to other consumers which reviews are the most useful for assessing the performance of the product.

Following, we elaborate on the importance of helpfulness of reviews, both from the customers and companies point of view.

2.1.2 Helpfulness

Numerous studies have extensively studied the determinants, outcomes and the influencing factors of the helpfulness of online reviews. Kim et al. [51] looked at the association of different online product review features (i.e. review valence, length, pros and cons, helpfulness, authorship, and product recommendation) with purchase probabilities and offer theoretical contributions to the literature on information processing, as well as managerial insights regarding how advertisers can use reviews and how firms should manage their online recommendation systems to better serve existing and potential consumers.

A helpful review reveals the diagnosis, i.e., the ability for other consumers to better understand the quality and performance of the product or service [50]. The measure of helpfulness plays a critical role in the review and recommendation [38], and its importance arises from the fact that a popular product usually has many reviews for the consumer to read. Therefore, assessments need to be classified and recommended for consumers. For example, Amazon.com asked readers to vote on the helpfulness of product reviews, with the ultimate goal of influencing consumer decisions by offering more useful reviews. According to Yang et al. [87], Amazon.com raises profits annually by $2.7 billion with this question: “Was this review useful?”.

Therefore, the helpfulness of a review relies on reliable and unbiased customer-based information, helping consumers in the online buying decision-making process. From a business point of view, it is important to implement a scale to classify the helpfulness of user assessments to understand their perceptions of products and/or services [7].

Other studies have examined how the online consumer review features influence the level of usefulness or helpfulness (or utility) of online reviews. Mudambi and Schuff [65] investigated review helpfulness using data from Amazon.com. The authors proved that the extremity and the word count positively affect the consumers’ perceptions of review helpfulness. They also have demonstrated that positive outcomes of helpful online customer reviews seems to reduce the fatal impacts of malicious reviews for vendors, increasing the reliability and usefulness for consumers, alleviating risk decision and uncertainty, getting the needed information, which is time-consuming and energy-consuming. Thus, actively providing helpful reviews can benefit consumers for quick purchase decisions and satisfy their shopping experience. Besides, they also showed that the product type plays a mediated role in influencing review helpfulness.

Afterward, more studies focusing on review helpfulness have been contributed identifying the determinants of review helpfulness [83], including the severity of language used in the review, reviewers’ identities and backgrounds [20], balance and presentation order [71], and truthful reviews [72]. Recently, Cui and Wang [25] have demonstrated that review presentation format (e.g., product videos and images) is also considered an influential factor in the helpfulness of reviews, as it allows consumers to obtain more product details, which are difficult to describe in text-based reviews, such as color, movement, and sounds.

The literature on the determinants of the reviews helpfulness has presented gradual developments in designing and validating algorithms to calculate the score of the helpfulness of the review and the classification of the product. Some researchers start by examining what makes the reviews useful and have found the importance of the source of review (e.g., characteristics of the reviewer), to influence the decision of a consumer to vote on the helpfulness of a review [38, 68]. Moreover, Chua & Banerjee [23] found that the relationship between the quality of the information and the helpfulness of the review varies according to the product category and the review (e.g., favorable, unfavorable, and mixed). These findings indicate that the type of product being studied is also an important factor when studying the helpfulness indicator of product reviews.

Some authors study how helpfulness may prioritize online product reviews by quality. Du et al. [30] proposed a deep neural architecture to learn the explicit content-rating interaction (ECRI) for automatic helpfulness prediction. Experimental results demonstrate that exploiting the explicit content-rating interaction improves automatic helpfulness prediction.

2.1.3 Rating

Rating is defined by Steck [75] as a measure of accuracy of the quality of a product. Typically, customers who purchase a product or use a service are invited to leave a review or rating based on their experience. These ratings are usually expressed on a scale of 1 to 5, with 1 being the lowest and 5 being the highest. The rating system serves several purposes [21, 69, 90]: (i) Provide feedback to the seller,(ii) Inform potential buyers that can use the ratings and reviews to make informed decisions about whether to purchase a product or service; (iii) Establish credibility for the seller or platform; (iv) Provide a sense of community among buyers and sellers.

According to Bonchi et al. [10], rating is a key measure to ensure the long-term success of e-commerce and to manage Customer Relationship Management (CRM) activities. In addition to reviews, positive ratings can change people’s attitudes about the related product review [48].

In most e-marketplaces, customers can leave comments, feedback, and ratings after an order from a third-party. This lets other customers know about experiences with products and services. According to Amazon [2], customers can rate third-party sellers from one to five stars, with five stars being the best. The seller’s average rating appears beside their name on Amazon’s site.

Some authors explore the relationship between product rating and reviews for predicting helpfulness, without introducing network concepts. For example, Dash et al. [27] introduced P2R2 (Product feature based Personalized Review Ranking), a framework to predict review helpfulness for individual consumers based on their preferences in product features using a latent class regression model. Ping et al. [70] developed a methodology for enhancing the quality and usefulness of online reviews using a machine learning approach. On a different perspective, Lee et al. [56] analyzed online reviews on Amazon.com to identify review types and key drivers of perceived usefulness of reviews.

We will see later on why ratings are important to our research and we will link them to centrality of a product in a social network.

2.2 Network centrality and rating/helpfulness

To define a framework to relate centrality of reviews (an essential metric in network science) with rating/helpfulness, it is important to refer to the existing related work.

Other authors introduce and explore networks’ concepts in this context. For example, Wang [84] examines the association between centrality and reviews by analyzing the differences in reviewer characteristics by network structural positions. In other words, the author identified a relationship between the centrality of reviewers and reviewer characteristics. Lee et al. [55] studied the relationship between the herding effect and ratings, where users’ ratings are influenced by prior ratings depending on movie popularity. Wang et al. [82] exploited the temporal sequence of social-networking events and ratings. They conclude that rating similarity between friends is significantly higher after the formation of the friend relationship, indicating that with social-networking functions, online rating contributors are socially nudged when giving their ratings. Li et al. [57], explored social influence in online restaurant reviews and concluded that prior average review rating exerts a positive influence on subsequent review ratings for the same restaurant, although the effect is reduced by the variance in existing review ratings. Su et al. [77] showed that complex networks of user relationships could be used with the proposed similarity measure to design a rating prediction algorithm for recommender systems (using MovieLens and Netflix data). De Meo et al. [62] studied helpfulness-based reputation (HBR) scores and centrality-based reputation (CBR) scores. As they mention in their research, the identification of users featuring large HBR scores is one of the most important research issues in the field of Social Networks, as a critical success factor of many Web-based platforms. Authors conclude that CBR scores allow for predicting HBR ones, and Eigenvector Centrality was found to be the most important predictor. So it is important to pull trust relationships to spot those users producing the most helpful reviews for the whole community. In Table 1 we summarize the main contributions around the links between centrality, rating and helpfulness.

Table 1 Existing literature exploring the links between centrality of reviews and rating/helpfulness

Full size table

There are also some authors that relate rating and helpfulness, such as Chua and Banerjee [22], who review helpfulness as a function of reviewer reputation, review rating, and review depth. They conclude that helpfulness is positively related to reviewer profile and review depth but is negatively related to review rating.

Although in previous literature, there are some associations between the concepts we address in our work, we could not find simultaneous links between centrality, rating and helpfulness. Our rationale is that this centrality may reflect an increase in rating and helpfulness. In other words, the centrality of a product in a social network of products can lead to a positive feedback loop, where increased visibility and exposure lead to higher ratings and perceived helpfulness, which in turn can lead to even greater visibility and exposure within the network. We can say that there is a relationship found in the literature that goes in the direction of the influence that centrality has separately on rating and helpfulness. In Fig. 1 we summarize the most important contributions of the literature considering the relationship between centrality, rating and helpfulness.

2.3 Research questions and hypotheses

As introduced above, the originality of this work lies in the relationship between both centrality and rating/helpfulness. We use a network of reviews connecting the products to each other. It is therefore a network in which the centrality of a product reflects its importance, because the product has been commented on by many users. Then, centrality measures obtained from networks of products are used: betweenness centrality and eigenvector centrality. Modularity is also used to measure the strength of division of the network into modules. The other variables of analysis—the “quality” measures—rating, and helpfulness, are obtained directly as a product’s features in the Amazon dataset. The relationship between the centrality measures, rating and helpfulness is explored in this work. Our central question is the following: do centrality measures improve product rating and helpfulness of reviews? This central research question is supported by three research hypotheses:

H1

Correlation between the centrality of reviews, product rating and the helpfulness of reviews is significant.

H2

The centrality of reviews (and the Rating) have an impact on the helpfulness of the reviews; and.

H3

The centrality of reviews (and the Helpfulness) has an impact on product rating.

Figure 2 illustrates the research framework and hypotheses and positions our work relating Centrality, Rating, and Helpfulness.

To assess H1, we first identify the products with higher levels of centrality in the network through a correlation matrix. In order to test H2 and H3, we apply regression tree models (Breiman, 1984) and Random Forests.

3 Methodology

In this section, we introduce the Measures of Centrality in Social Networks, namely Eigenvector Centrality and Betweenness Centrality. Afterward, in Sect. 4.3. (Methods), we introduce Bipartite Networks and explain how a projection of a bipartite network into a one–mode network can be used to produce a network of products.

3.1 Measures of centrality in social networks

Social networks are important systems for spreading flows through edge connectivity. Social Media is based on social networks, and involves publishing flows of information and shared content. Different types of flows include products or services recommendations, sharing posts about specific issues, or ratings. For example, in Spotify, users can recommend songs to their friends, including Facebook friends [51].

Generally speaking, a social network is a group of individuals or groups connected by some relationships. Links can be created online or offline [26]. Individuals are linked together by social bonds (often called relationships), which may be formal or informal [13]. In theory, a simple network or graph (G) is defined as a set of discrete social entities (called nodes or vertices), represented by V(G) and links (also called edges), represented by E(G). In symbolic terms, we may write a graph as a whole entity G, such as a tuple G = (V(G), E(G)). Entities or nodes in a social network are often called actors, and these can be individuals, organizations etc. [80]. Two nodes (vertices) connected to each other are considered neighbors and the number of elements of the system correspond to the number of nodes [6].

For example, in Fig. 3 there are eight nodes or vertices, V(G)$=\left\{v1,v2,v3,v4,v5,v6,v7,v8\right\}$, and eight links or edges: E(G)$=\left\{e1,e2,e3,e4,e5,e6,e7,e8\right\}.$

The information about the existence (or not) of links between the nodes often implies the creation of an adjacency matrix A. Let A be an adjacency matrix containing values aij, such that aij = 1 if node i is connected to node j and aij = 0. Therefore, the adjacency matrix is a square matrix |V|×|V| such as:

$$a_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & { {\text{if}}\;{\text{node}}\; i\;{\text{is}} \;{\text{connnected}}\;{\text{to}}\;{\text{node}}\;j} \hfill \\ {0,} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$

(1)

The diagonal elements of A are zero, since edges from a node to itself (loops) are not allowed in simple graphs.

3.2 Eigenvector centrality and betweenness centrality

One of the most popular measures of centrality is the node or degree centrality. The degree centrality of a node is simply its degree—the number of edges it has. The higher the degree, the more central the node is. However, degree centrality does not capture the real importance of a node in a network. While degree centrality only considers the number of connections a node has, eigenvector centrality considers the quality of those connections by taking into account the centrality of neighboring nodes. Betweenness centrality, on the other hand, focuses on the node’s position in the network and its ability to control information flow. These measures provide a more nuanced understanding of node importance by considering both local and global network characteristics. For these reasons, we think that Eigenvector Centrality and Betweenness Centrality are more complete and adequate than node centrality. We will introduce them in the following.

Eigenvector centrality is based on notions of influence, ranking, and prestige of the neighbors of the node that we intend to analyze [6]. That is to say, the centrality of a node is measured by the importance of the neighbors to which the node is connected, since they have easy access to the information and sources of influence. This index of centrality describes the general influence of a node throughout the network, which makes its importance and its impact on the reviews something to be further studied [66].

The Bonacich [9] approach is quite adequate for the calculation of centrality, since it not only takes into account the centrality of the neighbors, but also their quality. Then, the centrality of the vector for node i, ${x}_{i}$, is given by:

$$x_{i} = \frac{1}{\lambda }\mathop \sum \limits_{j = 1}^{n} a_{ij} x_{j} \quad i = 1,2 \ldots n$$

(2)

where ${a}_{i}$ is 1 or 0 whether there is (or there is not) a link between nodes i and j, respectively. Measure ${x}_{j}$ represents the centrality of node i, and $\lambda$ is a constant. It is wise to choose $\lambda$ as the largest eigenvalue in absolute value of matrix A. This measure of centrality is based on the fact that a node is important if the neighbor is important too. We assume that the cardinality of V(G) is n.

Eigenvector centrality has been used because it proves to be an important measure of centrality, since it considers the centrality of the neighbors [9].

The centrality of intermediation, also known as betweenness centrality, measures the extent to which a node is an important intermediary between the links of other nodes in the network, that is, it reflects the number of shorter paths connecting pairs of nodes that pass through a specific node. These nodes have a very high centrality because they connect communities that would otherwise not be linked [33]. Betweenness centrality (or intermediation) is one of the most used measures of centrality [85]. It measures the extent to which a node is an important intermediary between the links of other nodes in the network. The intermediation centrality C_B(x) of a vertex x in the network is given by:

$$C_{B } \left( x \right) = \mathop \sum \limits_{s \ne t \in V\left( G \right)} \frac{{\sigma_{st} \left( x \right)}}{{\sigma_{st} }}$$

(3)

where ${\sigma }_{\mathrm{st}}$(x) denotes the number of shortest paths between s and t containing x, and ${\sigma }_{\mathrm{st}}$ denotes the number of all the shortest paths between s and t in the network.

3.3 Bipartite networks

In this research there are two different types of nodes: products and reviewers as showed in Fig. 4.

This is the case of Bipartite networks, or Bipartite graphs whose vertices can be divided into two disjoint and independent subsets V1 and V2. In a bipartite graph every edge E links one node of V1 and one node of V2 [6]. Mathematically, the definition can be stated as follows:

Definition 1

(Adapted from Banerjee et al. [5]). G(V1, V2, E) will be called a bipartite graph if V(G) = V1(G) ∪ V2(G) and V1(G) ∩ V2(G) = $\varnothing$, and each edge connects two nodes (v1, v2) ∈ E(G). G will be a complete bipartite graph if ∀v1 ∈ V1(G) and ∀v2 ∈ V2(G), (v1, v2) ∈ E(G).

In the case of a Bipartite Network, assuming that r = #V1(G) and s = #V2(G), then the bi-adjacency matrix corresponds to the following:

$$A = \left( {\frac{{0_{rr} }}{{B^{T} }}\frac{B}{{0_{ss} }}} \right)$$

(4)

where B is an r × s matrix, and 0_r,r and 0_s,s represent the r × r and the s × s zero matrices.

3.3.1 Projection of a bipartite network into a one–mode network

Since we are interested in analyzing the importance of the products, measured by the corresponding centrality, we need to project the original bipartite network into a one-mode network. Bipartite networks can be transformed into unipartite networks through one-mode projections [5, 81, 85]. This means that the resultant network contains nodes of only one set: in our case, the products’ network. Application of a one-mode projection to a bipartite network generates two unipartite networks, one for each layer, G1 and G2, so that vertices with common neighbors are connected by edges in their respective projection.

Definition 2

(Adapted from Banerjee et al. [5]). Let G(V1, V2, E) be a bipartite graph with #V1(G) or |V1(G)|= r, and #V2(G) or |V2(G)|= s and #E(G) or |E(G)|= m. Projection of the bipartite graph G for the vertex set V1 with respect to the vertex set V2 is the same as to construct a unipartite or one-mode network G1 (V1, E′) where V(G) = V1 and (v1_iv1_j) ∈ E(G1) if N(V1_i) ∩ N(V1_j) ≠ $\varnothing .$ The same applies for the projection of the bipartite graph G for the vertex set V2 with respect to the vertex set V1: it is to construct a unipartite or one-mode network G2 (V2, E’’) where V(G) = V2 and (v2_iv2_j) ∈ E(G2) if N(V2_i) ∩ N(V2_j) ≠ $\varnothing .$

Where cardinality of neighborhood of a vertex (the degree of the vertex) is denoted by deg(v_i) =|V1(v_i)|. Figure 5a and b show, respectively, the one mode projections of V1 (i.e., G1) and V2 (i.e., G2), taken from the same example of Fig. 3, based on the two types of nodes of the dataset used in this work.

3.4 Modeling and analysis

3.4.1 Centrality measures

To identify the most important products in terms of centrality, the two measures of centrality are used: betweenness centrality and eigenvector centrality. To this end, a projected, one mode network, the network of products has been created to help calculate centrality, where the products (network nodes) are connected (through edges) when the same reviewer comments on these products, two or more. It is important to note that the projected network of products is also the projected network of reviews, because it is based on the reviewers. In this sense, the names “network of products” and “network of reviews” are used interchangeably.

To test the research hypotheses identified in Chapter 2, we first identify the products with higher levels of centrality in the network, using the Gephi software [8]. A correlation matrix and linear dependency models are used to test H1.

To identify the most important products in terms of centrality, the two measures of centrality are used: betweenness centrality and eigenvector centrality. To this end, a projected, one mode network, the network of products has been created to help calculate centrality, where the products (network nodes) are connected (through edges) when the same reviewer comments on these products, two or more. It is important to note that the projected network of products is also the projected network of reviews, because it is based on the reviewers. In this sense, the names “network of products” and “network of reviews” are used interchangeably.

3.4.2 Cluster analysis

A cluster analysis is performed to find patterns of nodes in the data. Detecting patterns of nodes based on their connectivity provides an idea of how node communities emerge in the network. On the other hand, it is also important to detect patterns of nodes based on their attributes. In network science, clustering is related to the consistency of certain patterns based on the similarity of the nodes. A cluster is a subset of nodes where the products are identical to each other, and distinct to the other products’ clusters. In this work we started by computing the number of clusters that are appropriate for our data, using the elbow rule and the scree plot—that takes into account the variance (within-group sum of squares): as the number of clusters increases, the variance decreases. The elbow at five clusters represents the most parsimonious balance between minimizing the number of clusters and minimizing the variance within each cluster. It is important to consider coherent sizes for the clusters, since a larger cluster increases redundancy, which makes each loop less important and compromises network conciseness [66].

For this purpose, we used K-means [59], a method for quantitative variables that iteratively groups n observations in k clusters, where each observation belongs to the cluster with the closest centroid. The objective is to minimize the sum of the quadratic error of the several groups generated, that is, the smaller the sum, the more homogeneous the groups will be. This method implies a previous choice of the number of initial points (centroids), giving rise to a number of groups predefined by the analyst [44]. After selecting the five initial centroids, the program repeats the algorithm until it reaches the minimum of the established criterion: (a) to form five clusters, associating each new product with the nearest centroid,and (b) recalculate the centroid of each cluster. The variables chosen for analysis are: Betweenness Centrality, Eigenvector Centrality, Rating and Helpfulness. Modularity was not included since it is a measure of the global interconnectedness, not providing important information at the level of individual nodes (except, for the modularity class, which is a qualitative variable).

3.4.3 Regression trees and random forests

In order to test H2 and H3, we applied Regression Trees and Random and Forests algorithms to study the impact of centrality on the other variables under study (rating and helpfulness).

Regression trees are increasingly used today as one of the predictive modeling approaches used in statistics, data mining and machine learning. They are supervised learning algorithms that use a tree nature in which each internal (non-leaf) node is labeled with an input feature and a target variable is predicted. The algorithm we use is CART (Classification And Regression Tree), first introduced by Breiman et al. [11]. Regression trees, including the CART algorithm, do not directly measure causality effects. Regression trees are primarily used for predictive modeling and identifying relationships between predictor variables and the target variable, although they may suggest some sort of causality effects.

The difference between trees used for regression and trees used for classification is the type of target variable (quantitative or qualitative, respectively). In this work, we use regression trees, as the variables to be predicted are quantitative. This research uses four variables: two centrality measures (Betweenness and Eigenvector) and two quality measures (Rating and Helpfulness). The latter will be used as dependent in the models to be presented later on.

3.5 Data set

To build the products’ network, we used a data set containing information about products, reviews, and reviewers (users) provided by Amazon.com. The data set is openly available [42] and contains a great number of reviews in several product categories [61]. Being considered one of the Big Five companies in the U.S. information technology industry, Amazon is an American multinational technology company focusing on e-commerce, cloud computing, digital streaming, and artificial intelligence.

The data set contains more than 150 million reviews on products in various categories, ranging from “books and technology” to “beauty articles”, registered from May 1996 to July 2014. For the sake of practicality, we selected the category of musical instruments. The selection of this category is due to the fact that it includes several subcategories of products with technical characteristics that require previous information to assist buying decisions, and consequently, the reviews are likely to be very useful for the future buyers to make their choices.

At the time of this study, there are 15 subcategories of musical instruments on the platform, among which we can find guitars, bass guitars, ukuleles, keyboards, microphones, strings, and accessories, among others. The category musical instruments, accounts for about 10,261 reviews and 500,176 ratings, the equivalent of 717 products and in each review the data set provides the information of the data set attributes described in Table 2.

Table 2 Attributes of the data set

Full size table

Initial data set includes about 10,261 product reviews (see Table 3). After the data cleansing step, some missing values have been removed (e.g., “helpful” non-information, “undefined” fields generating errors to the imported data base), as such, the effective sample accounts for 2214 reviews.^{Footnote 1} A primary analysis of these reviews generates 5562 relations of 717 different products, between 2010 and 2014.

Table 3 Data set summary

Full size table

4 Results

In this section, we present the main results, namely the centrality measures, cluster analysis and classification.

4.1 Network and centrality analysis

The original bipartite network contains the links between the reviews and the products and has been compressed into a one-mode projection. A new data set was created for further analysis, with aggregate data of network measures by product/review, as well as the rating and helpfulness. The network measures (Betweenness Centrality and Eigenvalue centrality) give us complementary perspectives of the importance of each product. Eigenvector centrality measures the transitive influence of nodes. Therefore, a node with higher Eigenvector centrality is connected to many nodes who themselves have high scores. On the other hand, Betweenness centrality measures the extent to which a node is an important intermediary between the links of other nodes in the network. Both centrality measures have been operationalized using igraph, a popular R package for network analysis and visualization.

As we will see in Regression Tree’s results and in the conclusions, Eigenvalue Centrality will be much more important than Betweenness Centrality for explaining Rating and Helpfulness. The reason for this to happen is that more connected products, measured by the importance of the neighbors to which nodes are connected, tend to have a higher impact on Rating and Helpfulness. Betweenness Centrality does not have the same impact in these quality measures.

Based on the one-mode projection network, it is possible to proceed with further analyses: a Cluster analysis and Regression-based analysis to establish relationships between Helpfulness, Rating and Centrality, that we present in the next sections.

4.2 Cluster analysis

Software R, [73] and kmeans, the function used to perform cluster analysis in R have been used for this task. Data has been standardized previously, since otherwise machine learning algorithms such as clustering will be dominated by the variables that use a larger scale, adversely affecting model performance. We used a normalization procedure in R (function scale that transforms original values into a [0,1] range interval. At the end, each cluster can be described by the corresponding means of the different attributes (see Table 4).

Table 4 Clusters’ means (final centroids) obtained with original variables—before normalization-, after applying K-means

Full size table

Products in Cluster 1 have the highest average Rating and Helpfulness values. Cluster 2 contains the highest Betweenness and Eigenvector Centrality means, together with higher Rating and Helpfulness. Cluster 3 contains mean values that are relatively low for all attributes. Cluster 4 contains higher means values for Helpfulness and Betweenness Centrality. Cluster 5 seems to be residual, as there is no attribute to stand out compared to the other clusters (with exception for Rating).

Attribute “Rating” is well represented in all clusters. Betweenness centrality and Helpfulness are combined to form clusters 2 and 4. Higher values of Rating and Helpfulness also emerge together in the clustering process, namely in clusters 1 and 2.

4.3 Testing research hypotheses

4.3.1 Relationship between helpfulness, rating and centrality

We start by computing correlation between variables in order to capture the strength of the relationship between Helpfulness, Rating and Centrality. Values are shown in Table 5.

Table 5 Correlation matrix (p-values in brackets—most significant values are in bold))

Full size table

The highest correlation (0.63) stands between the two measures of centrality: Betweenness and Eigenvector. Additionally, the correlation between Helpfulness and Rating is also relatively high (0.35) when compared to the other values of the correlation matrix. It means that these two attributes are associated, providing insights that the reviews of higher rated products are also the most useful ones. On the other hand, the correlation between the centrality measures (Betweenness and Eigenvector), and the quality measures (Rating and Helpfulness) is very low.

Previous research (Landherr et al., 2010), also finds the existence of a weak relationship between the centrality of reviews and the rating reveals empirical evidence that the users publish reviews, no matter whether products obtain high or low ratings. However, what is revealed is that more reviews—corresponding to higher centrality—are not synonymous with better quality. Products may be central in the network, although with low ratings.

As discussed in the literature review, the question: “Was this review useful?” has been playing an increasingly important role in helping consumer decision-making, so that the user receives information from someone who has already used the product and decided to share their experience spontaneously and free of charge. However, from our results products may have high centrality rates, although providing little helpfulness reviews to users.

4.3.2 Using regression trees and random forests to assess the impact of the centrality measures on the rating and helpfulness.

To answer hypotheses H2 and H3, we have developed a regression tree using the Rpart implementation of the CART algorithm available in R package Rpart [79]. The algorithm rpart (recursive partitioning and regression trees) of R is an implementation of CART by Breiman et al. [11]. Although the algorithm rpart is not exactly the same as CART, they share similarities in their methodology and purpose. Both rpart and CART share the same underlying principles of recursive binary splitting, building decision trees, and pruning to balance model complexity and prediction accuracy. Package Rpart.plot has been used for the plots.

4.3.2.1 Rating

We start by measuring the impact of the Centrality variables on Rating. For that purpose, all variables have been used as explanatory and Rating has been used as dependent.

In Fig. 6a, we can see from rows (4) and (5) that when Helpfulness is higher (>= 0.3541667), then Rating is also higher, on average (78.99027). It is also possible to see that when Eigenvector Centrality is higher, then it has a positive impact on Rating. This means that more connected products, measured by the importance of the neighbors to which nodes are connected, tend to have a higher Rating. We also computed the feature importance measure provided by rpart, based on the mean decrease in node impurity (Gini index or deviance) caused by a particular predictor variable.

Higher values indicate greater importance, suggesting that the variable has a stronger impact on the target variable within the tree. Eigenvector Centrality is the most important variable to predict Rating, followed by Helpfulness.

A pruning procedure has been used with the prune() function, as a way to reduce complexity and the size of the tree by removing parts (branches) that do not provide power to classify instances. The tree was indeed very much reduced but, as a consequence, the outcome is almost uninterpretable. We then calculated model accuracy by creating a procedure based on the Holdout Method. Model evaluation aims to estimate the generalization accuracy of a model on future (unseen/out-of-sample) data. We took the usual procedure of splitting the data using 70% of the original data as training data and the remaining as test data. Test data has been used to get predictions from the model trained on the training data. To evaluate the differences between the predictions from the model and the original data, we compute two measures of accuracy: mean absolute error (MAE) and root mean square error (RMSE).

$$MAE = \frac{1}{n}\mathop \sum \limits_{j = 1}^{n} \left| {P_{i} - T_{i} } \right| \quad i = 1,2 \ldots n$$

(5)

$$RMSE = \frac{1}{n}\sqrt {\mathop \sum \limits_{j = 1}^{n} \left( {P_{i} - T_{i} } \right)^{2} } \quad i = 1,2 \ldots n$$

(6)

where n is the dimension of the test data, P_i is the i-th predicted value of the test data and T_i is the i-th original value. After 100 model iterations, we got an average MAE of 0.522 and a RMSE of 0.707.

In order to explore an alternative to Regression Trees, we tested Random Forests [12]. This is a type of ensemble learning method also used for regression and other tasks that work by creating many regression trees at training time and outputting the mean/average prediction of the individual tree.

We used R package RandomForests and ran the model 100 times, for which we obtained an average accuracy of: MAE = 0.366 and RMSE = 0.522. Random Forest also helps to understand how much the accuracy increases when an explanatory (independent) variable is included in terms of its Mean Square Error (%Increase MSE). A second measure of accuracy is based on the decrease of the residual sum of squares of impurity when a variable is chosen to split a node (IncNodePurity).

Therefore, we can state from the results that Helpfulness can be a (weak) predictor of Rating, as it increases by 0.13% the corresponding predicting capacity. Betweenness is a poorer predictor of Rating, and Eigenvector centrality does not work well as a predictor at all, since the increase in prediction is negative.

4.3.2.2 Helpfulness

We ran the model again, but now taking all variables as explanatory and Helpfulness as dependent.

Using the regression tree above we learn that when Rating is higher than 3.3., then Helpfulness is also higher on average. The same type of relationship (though weaker), occurs with Eigencentrality, from which we can conclude that these variables have a positive association. After running the evaluation procedure, we obtained MAE = 3.458 and RMSE = 3.535.

Again, we computed the variable importance provided by rpart based on the mean decrease in node impurity (Gini index or deviance) caused by a particular predictor variable.

Eigenvector Centrality is once more the most important variable to predict Rating, followed by Helpfulness.

Using Random Forests, we obtained an average accuracy of: MAE = 3.460 and RMSE = 3.537.

The impact of both centrality measures (Betweenness and Eigenvector) on Helpfulness, seen from the perspective of the mean increase in accuracy is very low (see Tables 6, 7, 8, 9). A summary of the accuracy measures (MAE and RMSE) for Regression Trees and Random Forests is presented in Table 10.

Table 6 Variable importance in the regression tree for predicting rating

Full size table

Table 7 Accuracy measures of random forest algorithm performance taking rating as dependent variable

Full size table

Table 8 Variable importance in the regression tree for predicting helpfulness

Full size table

Table 9 Accuracy measures of random forest algorithm performance taking helpful as dependent variable

Full size table

Table 10 Summary of accuracy measures (MAE and RMSE) for regression trees and random forests

Full size table

When Rating is used as a target (dependent) variable, the accuracy is higher than with Helpfulness. On the other hand, Random Forests are more accurate (MAE and RMSE are smaller) than CART for predicting Rating but not for predicting Helpfulness.(Fig. 7).

4.3.3 Hypotheses outcomes

Having reached this stage, and after checking the research hypotheses, we are able to recapitulate the following conclusions:

H1

Our results present significant evidence that there is a clear relationship between product Rating and the Helpfulness of the reviews. That is, the higher the Rating of a product, the higher the Helpfulness of the corresponding review. In addition, there is also a strong association between the two measures of centrality, Betweenness and Eigenvector centrality, meaning that influence, ranking, and prestige of the neighbors of a product are also important in placing it as an intermediary between the links of other products in the network. We could not find any significant correlation between centrality measures and quality measures.

H2

Although Betweenness Centrality has low impact on the Rating, it may be however a predictor of Rating. Eigenvector centrality has a positive impact on Rating but cannot be considered a predictor of Rating.

H3

Measures of centrality (Betweenness Centrality and Eigenvector Centrality) have a positive (weak) impact on Helpfulness, although they cannot be considered good predictors of Helpfulness.

5 Discussion and conclusions

Online customer reviews provide new potential customers with relevant information about a product or service, helping them in complex and risky buying decisions. In this research, we used an original bipartite network containing the links between the reviews and the products that have been compressed into a one-mode projection, corresponding to a single mode network of products linked by the respective reviewers. We used centrality measures to assess the amount of reviews, not exactly by their number, but by measuring their importance in the network—measured by the centrality—of the products reviewed.

Our results present significant evidence that there is a clear relationship between product Rating and the Helpfulness of the reviews. That is, the higher the Rating of a product, the higher the Helpfulness of the corresponding review. This relationship operates in both ways, meaning the Helpfulness and Rating can both be used for predicting each other. This result is in line with previous research that identifies a similar positive relationship regarding the impact of review rating on review helpfulness. More specifically, those reviews with two-star ratings are the most helpful, while helpfulness drops dramatically for three-star reviews and increases slightly again for those four- and five-star ones Ping et al., [70]. Lee et al. [56] has shown that reviews with both higher star ratings and longer reviews are usually perceived to be more helpful to potential customers and therefore have positive impacts on the purchase decision, particularly of experience goods.

Furthermore, we also found that Betweenness and Eigenvector centrality are also correlated, meaning that influence, ranking, and prestige of the neighbors of a product are also important in placing it as an intermediary between the links of other products in the network.

On the other hand, our results also show significant evidence that there is no clear relationship between the measures of centrality (Betweenness and Eigenvector) and product quality (Rating and Helpfulness). In other words, it is concluded that consumers comment on a product, regardless of its quality. Therefore, a high centrality of reviews does not imply a high rated product, for several potential reasons, such as customer dissatisfaction about product performance, or heterogeneous customer value expectancy. Additionally, Ping et al. [70] indicate that review volume for the product becomes much less important to the review helpfulness after some initial reviews have been accumulated. And not all customer reviews provide valuable and credible feedback and the sheer volume of online reviews also creates the problem of information overload.

Despite the relationship between the herding effect and ratings our results do not show a similar effect of reviews centrality on quality measures: rating and review helpfulness. This finding might suggest that products may be central (e.g. most popular) to the network, although providing little review ratings or review helpfulness, offering no valuable information and credible feedback to help consumer decision making.

Even so, measures of centrality can be used as predictors of the helpfulness of the reviews. This means that the more central the products stand in the network of reviews, the more useful they can be. This does not mean that Betweenness Centrality and Eigenvector Centrality are good predictors of Helpfulness—which they are not-, but in the way the relationship exists, which is confirmed by the positive correlation and the patterns that have been found in the cluster analysis. For Rating, the particular influence of Betweenness Centrality is interesting: the power of intermediation of the product reviews are somewhat connected to higher rated products. We know this relationship is weak and not significant, but it opens up the exploration of new possibilities of seeking relationships between centrality and quality measures.

5.1 Theoretical contributions

In this research we develop a new theoretical framework to analyze product rating and perceived helpfulness of the online customer reviews. The study provides a one-mode projection-based approach of a bipartite network of products sold by the Amazon.com e-Marketplace in the category “musical instruments”, by linking products through the reviews, simultaneously containing two types of nodes: reviewers and products. The results of this study contribute to the current understanding of reviews centrality measures (betweenness and eigenvector), and the quality measures (rating and helpfulness) within a network of product reviews. The main findings are the existence of a clear relationship between product rating and the helpfulness of the reviews and a weak relationship between the centrality measures (betweenness and eigenvector), and the quality measures (rating and helpfulness). It explains that a high number of reviews do not necessarily imply a high product rating. On the other hand, when reviews are helpful for consumer decision-making, we observe an increase in the number of reviews. In other words, products may be central to the network, although with low ratings and with reviews providing little usefulness to consumers.

5.2 Practical contributions

The findings in this study have many important implications for e-commerce businesses’ improvement of the review service management to support customers’ experiences and online customers’ decision-making.

First, online firms need to facilitate the most pertinent reviews to help ensure that customers find the most relevant information to meet their needs. Thus, providing review helpfulness is the way potential consumers perceive other consumers’ reviews as more informative and helpful, being an important factor to assist consumers’ decision-making and to mitigate the information overload problem [70].

It is important to leverage product rating as this measure is considered the best way to exhibit product quality information, grab consumer attention, reduce risk decision-making and persuade consumption and, actively providing helpful reviews can benefit consumers for quick purchase decisions and satisfy their shopping experiences.

Second, motivating and rewarding reviewers to post credible ratings, long reviews, which contain clear definitions, specific explanations, and more precise descriptions about the reviewers’ experiences with the product, are of great help to potential customers of experience products in making their purchase decisions.

Most e-commerce websites today provide reviewers the opportunity to post product videos and images, consumers can obtain more product details, which are difficult to describe in text-based reviews, such as color, movement, and sounds [88], being of critical relevance for assisting buying decision-making of experimental products when compared with search products [25].

Online vendors may encourage and reward their consumers in different ways (e.g., cashback, vouchers, and member points) to write reviews with images or videos for marketing. This suggestion is in line with Woolley and Sharif [86] which conclude that “Simply knowing you’ll receive a reward for writing a review makes the process more enjoyable, which makes you more likely to write a positive review”. Therefore, offering incentives can be an effective strategy for improving customers’ review-writing experience and increasing the positive and helpful content in product reviews. Additionally, review helpfulness can be used to monitor reviewer qualifications, conferring a badge to top quality reviewers.

Third, it suggests redesigning review sorting interfaces and displaying the consumer rating distribution by helpfulness on the product page, resulting in consumer trust which is of instrumental support for consumer decision making. Therefore, such a service design is especially vital for online retailers, such as, Amazon marketplace, where online customer reviews are extremely voluminous and overwhelming.

Finally, this study might inspire online businesses providing diverse review tools while understanding the impact of online reviews and social networks, on the brand reputation and reliability of the seller. They can generate a customer reviewer community to develop effective strategies to help build and strengthen their relationships with customers, having long-term effects on revisits to the site and the product/service repurchases [56].

5.3 Limitations and future research

Although this research is yet another step in the review of reviews, we recognize that this approach has some limitations. First, the main one concerns the unique analysis of a specific product category and not the product range available on the Amazon e-Marketplace. Therefore, as a challenge for future work, we suggest analyzing a different category of “search products”—such as music or books—to see if they show similar results to those obtained in this research and how these relationships vary between search products and experience products. Noticeably, Mudambi and Schuff [65] showed that the product type plays a mediated role in influencing review helpfulness.

Second, this study examined only online reviews posted on Amazon.com. This limitation provides opportunities for future research to explore other factors including online/offline retailers advertising and specific situations faced by potential customers, which can affect the helpfulness of online reviews. For example, future studies can investigate the dominant determinants of review helpfulness and examine its implicit dependency on various review tools (e.g. text/description length, photos, video), reviewer characteristics (e.g. cultural, technical) and product/service category (e.g. search versus experience goods).

Another limitation relates to the focus of our study: it does not distinguish true and false reviews. We do not approach fake reviews and false scores, although we assume they may exist. A lot of research highlights the importance of truthful and unbiased peer-to-peer information when consumers rely on reviews to make wise buying decisions. However, malicious consumer reviews and fake reviews provided by vendors-disguised “consumers” become the major problems that interfere with consumers making the right choices [72]. This is a future research area calling for using Machine Learning algorithms to detect both computer- and human-generated fake reviews.

Notes

The difference is due to the exclusion of the product reviews for which there is null data for “helpful” indicator, before year 2010, originating errors and "undefined" fields when importing data.

References

Amazon (2021a). Customer reviews. Retrieved in April, 19th, 2021, from https://www.amazon.com/gp/help/customer/display.html?nodeId=G3UA5WC5S5UUKB5.
Amazon (2021b). Comments, feedback, and ratings about sellers. Retrieved in April, 19th, 2021 from https://www.amazon.com/gp/help/customer/display.html?nodeId=G5T39MTBJSEVYQWW.
Ali, M. M., Doumbouya, M. B., Louge, T., Rai, R., & Karray, M. H. (2020). Ontology-based approach to extract product’s design features from online customers’ reviews. Computers in Industry, 116, 103175. https://doi.org/10.1016/j.compind.2019.103175
Article Google Scholar
Ba, S., & Pavlou, P. (2002). Evidence of the effect of trust building technology in electronic markets: Price premiums and buyer behavior. MIS Quarterly, 26(3), 243–268.
Article Google Scholar
Banerjee, S., Jenamani, M., & Pratihar, D. K. (2017). Properties of a projected network of a bipartite network. In 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 0143–0147, https://doi.org/10.1109/ICCSP.2017.8286734.
Barabasi, A.-L. (2013). Network science. Philosophical Transactions of the Royal Society, Vol. 371, Issue 1987. https://doi.org/10.1098/rsta.2012.0375.
Bartosiak, M. L., & Piccoli, G. (2016). Presentation format and online reviews persuasiveness: The effect of computer-synthesized speech. In 2016 International conference on information systems, ICIS 2016, pp. 1–11.
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. In Third international AAAI conference on weblogs and social media, 361–362. https://doi.org/10.1136/qshc.2004.010033.
Bonacich, P. (2007). Some unique properties of eigenvector centrality. Social Networks, 29(4), 555–564. https://doi.org/10.1016/j.socnet.2007.04.002
Article Google Scholar
Bonchi, F., Castillo, C., Gionis, A., & Jaimes, A. (2011). Social network analysis and mining for business applications. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–17.
Article Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth, Inc.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article Google Scholar
Bulte, C., & Stremersch, S. (2004). Social contagion and income heterogeneity in new product diffusion: A meta-analytic test. Marketing Science, 23(4), 530–544. https://doi.org/10.1287/mksc.1040.0054
Article Google Scholar
Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4), 331–370. https://doi.org/10.1023/A:1021240730564]
Article Google Scholar
Burton, J., & Khammash, M. (2013). Why do people read reviews posted on consumer-opinion portals? Journal of Marketing Management, 1376, 51–76. https://doi.org/10.4324/9780203722381
Article Google Scholar
Chakravarty, A., Liu, Y., & Mazumdar, T. (2010). The differential effects of online word-of-mouth and critics’ reviews on pre-release movie evaluation. Journal of Interactive Marketing, 24(3), 185–197. https://doi.org/10.1016/j.intmar.2010.04.001
Article Google Scholar
Chen, L-S., Lin, J-Y. (2013). A study on review manipulation classification using decision tree. In IEEE 2013 10th international conference on service systems and service management (ICSSSM).
Chen, J., Teng, L., Yu, Y., & Yu, X. (2016). The effect of online information sources on purchase intentions between consumers with high and low susceptibility to informational influence. Journal of Business Research, 69(2), 467–475. https://doi.org/10.1016/j.jbusres.2015.05.003
Article Google Scholar
Chen, L.-S., Hsu, F.-H., Chen, M.-C., & Hsu, Y.-C. (2008). Developing recommender systems with the consideration of product profitability for sellers. Information Sciences, 178(4), 1032–1048. https://doi.org/10.1016/j.ins.2007.09.027
Article Google Scholar
Cheung, M. Y., Sia, C. L., & Kuan, K. K. Y. (2012). Is this review believable? A study of factors affecting the credibility of online consumer reviews from an elm perspective. Journal of the Association for Information Systems, 13(8), 618–635.
Article Google Scholar
Chevalier, J., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of Marketing Research, 43(3), 345–354.
Article Google Scholar
Chua, A. Y. K., & Banerjee, S. (2014). Understanding review helpfulness as a function of reviewer reputation, review rating, and review depth. Journal of the Association for Information Science and Technology, 66(2), 354–362.
Article Google Scholar
Chua, A., & Banerjee, S. (2016). Helpfulness of user-generated reviews as a function of review sentiment, product type and information quality. Computers in Human Behavior, 54, 547–554. https://doi.org/10.1016/j.chb.2015.08.057
Article Google Scholar
Cui, G., Lui, H.-K., & Guo, X. (2012). The effect of online consumer reviews on new product sales. International Journal of Electronic Commerce, 17(1), 39–58. https://doi.org/10.2753/JEC1086-4415170102
Article Google Scholar
Cui, Y., & Wang, X. (2022). Investigating the role of review presentation format in affecting the helpfulness of online reviews. Electronic Commerce Research. https://doi.org/10.1007/s10660-022-09590-4
Article Google Scholar
Das, K., Samanta, S., & Pal, M. (2018). Study on centrality measures in social networks: a survey. Social Network Analysis and Mining. https://doi.org/10.1007/s13278-018-0493-2
Article Google Scholar
Dash, A., Zhang, D., & Zhou, L. (2021). Personalized ranking of online reviews based on consumer preferences in product features. International Journal of Electronic Commerce, 25(1), 29–50.
Article Google Scholar
Dellarocas, C., Awad, N., & Xiaoquan, Z. (2004). Exploring the value of online reviews to organizations: Implications for revenue forecasting and planning (2004). In ICIS 2004 proceedings. p. 30.
Dellarocas, C., Xiaoquan, Z., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45.
Article Google Scholar
Du, J., Rong, J., Wang, H., Zhang, Y. (2019). Helpfulness prediction for online reviews with explicit content-rating interaction. In: R. Cheng, N. Mamoulis, Y. Sun, X. Huang (eds) Web information systems engineering – WISE 2019. WISE 2020. Lecture notes in computer science, vol. 11881. Springer, Cham. https://doi.org/10.1007/978-3-030-34223-4_50
Duan, W., Gu, B., & Whinston, A. B. (2008). Do online reviews matter?—An empirical investigation of panel data. Decision Support Systems, 45(4), 1007–1016. https://doi.org/10.1016/j.dss.2008.04.001
Article Google Scholar
eMarketer. (2022). Global ecommerce forecast 2022: As 2-year boom subsides, plenty of bright spots remain. Retrieved February 20, 2023, from https://www.emarketer.com/content/global-ecommerce-forecast-2022.
Everett, M., & Valente, T. (2016). Bridging, brokerage and betweenness. Social Networks, 44, 202–208. https://doi.org/10.1177/0003122413519445.Are
Article Google Scholar
Filieri, R. (2016). What makes an online consumer review trustworthy? Annals of Tourism Research, 58, 46–64. https://doi.org/10.1016/j.annals.2015.12.019
Article Google Scholar
Fleder, D., & Hosanagar, K. (2009). Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Science, 55(5), 697–712. https://doi.org/10.1287/mnsc.1080.0974
Article Google Scholar
Forman, C., Ghose, A., & Goldfarb, A. (2008). Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets. Information Systems Research, 19(3), 291–313.
Article Google Scholar
Gerani, S., Mehdad, Y., Carenini, G., Ng, R., & Nejat, B. (2014). Abstractive summarization of product reviews using discourse structure. In Proceedings of the 2014 conference on empirical methods in natural language processing, pp. 1602–1613. https://doi.org/10.1007/978-3-642-14834-7_15
Ghose, A., & Ipeirotis, P. (2011). Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. https://doi.org/10.1109/TKDE.2010.188
Article Google Scholar
Godes, D., & Mayzlin, D. (2004). Using online conversations to study word-of-mouth communication. Marketing Science, 24(4), 545–560.
Article Google Scholar
Godes, D., & Mayzlin, D. (2009). Firm-created word-of-mouth communication: Evidence from a field test. Marketing Science, 28(4), 721–739.
Article Google Scholar
Ha, T., & Wasserman, S. (2017). Item-network-based collaborative filtering: A personalized recommendation method based on a user’s item network. Information Processing and Management, 53(5), 1171–1184. https://doi.org/10.1016/j.ipm.2017.05.003
Article Google Scholar
He, R., & McAuley, J. (2016). Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. https://doi.org/10.1145/2872427.2883037.
He, S., Hollenbeck, B., and Proserpio, D., (2021). The market for fake reviews. In EC ‘21: Proceedings of the 22nd ACM conference on economics and computation, July 2021 Pages 588, Association for Computing Machinery, https://doi.org/10.1145/3465456.3467589
Hennig, C., Meila, M., Murtagh, M., & Rocci, R. (2015). Handbook of cluster analysis. In Handbooks of modern statistical methods (1st ed.). Chapman and Hall/CRC.
Book Google Scholar
Hennig-Thurau, T., Marchand, A., & Marx, P. (2012). Can automated group recommender systems help consumers make better choices? Journal of Marketing, 76(5), 89–109. https://doi.org/10.1509/jm.10.0537
Article Google Scholar
Hennig-Thurau, T., Walsh, G., & Walsh, G. (2003). Electronic word-of-mouth: Motives for and consequences of reading customer articulations on the internet. Journal of Electronic Commerce, 8(2), 51–74. https://doi.org/10.1504/IJECRM.2008.020411
Article Google Scholar
Hollenbeck, B., Moorthy, S., & Proserpio, D. (2019). Advertising strategy in the presence of reviews: An empirical analysis. Marketing Science, 38(5), 793–811. https://doi.org/10.1287/mksc.2019.1180
Article Google Scholar
Hong, S., & Park, H. (2012). Computer-mediated persuasion in online reviews: Statistical versus narrative evidence. Computers in Human Behavior, 28(3), 906–919. https://doi.org/10.1016/j.chb.2011.12.011
Article Google Scholar
Jackson, M.O. (2008). Social and Economic Networks, Princeton: Princeton University Press
Jiang, Z., & Benbasat, I. (2007). Investigating the Influence of the functional mechanims of online product presentations. Information System Research, 18(2), 1–17. https://doi.org/10.1287/isre.l070.0124
Article Google Scholar
Kim, H., Ghiasi, B., Spear, M., Laskowski, M., & Li, J. (2017). Online serendipity: The case for curated recommender systems. Business Horizons, 60(5), 613–620. https://doi.org/10.1016/j.bushor.2017.05.005
Article Google Scholar
Kong, D., Tang, J., Zhu, Z., Cheng, J., & Zhao, Y. (2017). De-biased dart ensemble model for personalized recommendation. In Proceedings - IEEE international conference on multimedia and expo, pp. 553–558. https://doi.org/10.1109/ICME.2017.8019536
Landherr, A., Friedl, B., & Heidemann, J. (2010). A critical review of centrality measures in social networks. Business & Information Systems Engineering, 2, 371–385. https://doi.org/10.1007/s12599-010-0127-3
Article Google Scholar
Lee, M., Hirose, A., Hou, Z.-G., & Kin, R. (2013). LNCS 8226 - Neural Information Processing.
Lee, Y.-J., Hosanagar, K., & Tan, Y. (2015). Do i follow my friends or the crowd? Information cascades in online movie ratings. Management Science, 61(9), 2241–2258.
Article Google Scholar
Lee, S. G., Trimi, S., & Yang, C. G. (2018). Perceived usefulness factors of online reviews: A study of amazon.com. Journal of Computer Information Systems, 58(4), 344–352. https://doi.org/10.1080/08874417.2016.1275954
Article Google Scholar
Li, H., Meng, F., Jeong, M., & Zhang, Z. (2020). To follow others or be yourself? Social influence in online restaurant reviews. International Journal of Contemporary Hospitality Management, 32(3), 1067–1087. https://doi.org/10.1108/IJCHM-03-2019-0263
Article Google Scholar
Litvin, S. W., Goldsmith, R. E., & Pan, B. (2008). Electronic word-of-mouth in hospitality and tourism management. Tourism Management, 29(3), 458–468. https://doi.org/10.1016/j.tourman.2007.05.011
Article Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations, Press, Berkeley, Calif.
McAuley, J., & Leskovec, J. (2013). Hidden factors and hidden topics : Understanding rating dimensions with review text. In RecSys ’13 proceedings of the 7th ACM conference on recommender systems, pp. 165–172. https://doi.org/10.1145/2507157.2507163
McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015). Image-based recommendations on styles and substitutes. https://doi.org/10.1145/2766462.2767755
Meo, P., Musial-Gabrys, K., Rosaci, D., Sarnè, G., & Aroyo, L. (2017). Using centrality measures to predict helpfulness-based reputation in trust networks. ACM Transactions on Internet Technology, 17(1), 1–20. https://doi.org/10.1145/2981545
Article Google Scholar
Mintel (2015). Social Networking - Available at: http://academic.mintel.com/display/739944/
Mo, Z., Li, Y.-F., & Fan, P. (2015). Effect of online reviews on consumer purchase behavior. Journal of Service Science and Management., 08, 419–424. https://doi.org/10.4236/jssm.2015.83043
Article Google Scholar
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study of customer reviews on amazon.com. MIS Quarterly, 34(1), 185–200. https://doi.org/10.2307/20721420
Article Google Scholar
Muller, E., & Peres, R. (2019). The effect of social networks structure on innovation performance: A review and directions for research. International Journal of Research in Marketing. https://doi.org/10.1016/j.ijresmar.2018.05.003
Article Google Scholar
Nguyen, T.-S., Lauw, H., & Tsaparas, P. (2015). Review synthesis for micro-review summarization. In Proceedings of the eighth ACM international conference on web search and data mining - WSDM ’15, 2(February), pp. 169–178. https://doi.org/10.1145/2684822.2685321
Park, S., & Nicolau, J. (2015). Asymmetric effects of online consumer reviews. Annals of Tourism Research, 50, 67–83. https://doi.org/10.1016/j.annals.2014.10.007
Article Google Scholar
Pavlou, P. A., & Gefen, D. (2004). Building effective online marketplaces with institution-based trust. Information Systems Research, 15(1), 37–59.
Article Google Scholar
Ping, Y., Buoye, A., & Vakil, A. (2023). Enhanced review facilitation service for C2C support: Machine learning approaches. Journal of Services Marketing. https://doi.org/10.1108/JSM-01-2022-0005
Article Google Scholar
Purnawirawan, N., Pelsmacker, P. D., & Dens, N. (2012). The perceived usefulness of online review sets: The role of balance and presentation order. Advances in Advertising Research, 3, 177–190.
Article Google Scholar
Racherla, P., & Friske, W. (2012). Perceived ‘usefulness’ of online consumer reviews: An exploratory investigation across three services categories. Electronic Commerce Research and Applications, 11(6), 548–559.
Article Google Scholar
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Salminen, J., Kandpal, C., Kamel, A. M., Jung, S., & Jansen, B. J. (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771.
Article Google Scholar
Steck, H. (2013). Evaluation of recommendations: Rating-prediction and ranking Harald. Netflix Inc., pp. 213–220. Retrieved from https://doi.org/10.1145/2507157.2507160.
Steffes, E., & Burgee, L. (2009). Social ties and online word of mouth. Internet Research, 19(1), 42–59. https://doi.org/10.1108/10662240910927812
Article Google Scholar
Su, Z., Lin, Z., Ai, J., & Li, H. (2021). Rating prediction in recommender systems based on user behavior probability and complex network modeling. IEEE Access, 9, 30739–30749. https://doi.org/10.1109/ACCESS.2021.3060016
Article Google Scholar
Tang, J., Gao, H., Hu, X., & Liu, H. (2013). Context-aware review helpfulness rating prediction. In Proceedings of the 7th ACM conference on recommender systems (RecSys ’13). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/2507157.2507183
Therneau, T., Atkinson, N. (2018). rpart: Recursive partitioning and regression trees. R package version 4.1–13. https://CRAN.R-project.org/package=rpart.
Torres, A., & Martins, F. (2014). Online social networks: Recommendation diffusion and co-consumption influence. Handbook of research on enterprise 2.0: Technological, social, and organizational dimensions (Vol. 2, pp. 466–485). IGI Global, USA.
Chapter Google Scholar
Valejo, A., Ferreira, V., Filho, G. P. R., Oliveira, M. C. F., & Lopes, A. A. (2017). One-mode projection-based multilevel approach for community detection in bipartite networks. In 4th Annual international symposium on information management and big data, 2017, Lima, Peru.
Wang, C. A., Zhang, X. M., & Hann, I.-H. (2018). Socially nudged: A quasi-experimental study of friends’ social influence in online product ratings. Information Systems Research, 29(3), 641–655.
Article Google Scholar
Wang, Y., Wang, T., & Yao, T. (2019). What makes a helpful online review? A meta-analysis of review characteristics. Electronic Commerce Research, 19(10), 257–284.
Article Google Scholar
Wang, H. J. (2022). Understanding reviewer characteristics in online reviews via network structural positions. Electron Markets 32, 1311–1325. https://doi.org/10.1007/s12525-022-00561-z
Article Google Scholar
Wasserman, S., & Faust, K. (1994). Social network analysis: Theory and applications. Cambridge University Press. https://doi.org/10.1525/ae.1997.24.1.219
Book Google Scholar
Woolley, K., & Sharif, M. A. (2021). What happens when companies pay customers to write reviews? Harvard Business Review, retrived in February 20th, 2023, from https://hbr.org/2021/06/what-happens-when-companies-pay-customers.
Yang, Y., Yan, Y., Qiu, M., & Bao, F. S. (2015). Semantic analysis and helpfulness prediction of text for online product reviews. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp. 38–44. https://doi.org/10.3115/v1/P15-2007
Xu, P., Chen, L., & Santhanam, R. (2015). Will video be the next generation of e-commerce product reviews? Presentation format and the role of product type. Decision Support Systems, 73, 85–96.
Article Google Scholar
Zhang, L. (2015). Online reviews: The impact of power and incidental similarity. Journal of Hospitality Marketing and Management, 24(6), 633–651. https://doi.org/10.1080/19368623.2014.929550
Article Google Scholar
Zhang, J., Ackerman, M. S., & Adamic, L. A. (2007). Expertise networks in online communities: Structure and algorithms. In Proceedings of the 16th international conference on World Wide Web, pp. 221–230.
Zhao, Q., Zhang, Y., Friedman, D., & Tan, F. (2015). E-commerce recommendation with personalized promotion. In Proceedings of the 9th ACM conference on recommender systems - RecSys ’15, pp. 219–226. https://doi.org/10.1145/2792838.2800178

Download references

Funding

Open access funding provided by FCT|FCCN (b-on).

Author information

Authors and Affiliations

Faculty of Economics, University of Porto, Rua Roberto Frias, s/n, Porto, Portugal
Pedro Campos & Eva Pinto
Higher Institute of Accountancy and Administration, University of Aveiro Campus de Santiago, Aveiro, Portugal
Ana Torres
LIAAD - Laboratory of Artificial Intelligence and Decision Support, INESC TEC Campus da FEUP, Rua Dr. Roberto Frias, s/n, Porto, Portugal
Pedro Campos & Ana Torres

Authors

Pedro Campos
View author publications
You can also search for this author in PubMed Google Scholar
Eva Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Ana Torres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Torres.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Campos, P., Pinto, E. & Torres, A. Rating and perceived helpfulness in a bipartite network of online product reviews. Electron Commer Res (2023). https://doi.org/10.1007/s10660-023-09725-1

Download citation

Accepted: 16 June 2023
Published: 02 August 2023
DOI: https://doi.org/10.1007/s10660-023-09725-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rating and perceived helpfulness in a bipartite network of online product reviews

Abstract

Similar content being viewed by others

Unveiling music genre structure through common-interest communities

Differences in Online Review Content between Old and New Products: An Abstract

Recommender systems based on user reviews: the state of the art

Explore related subjects

1 Introduction

2 Theoretical framework

2.1 Customer reviews, helpfulness and rating

2.1.1 Reviews

2.1.2 Helpfulness

2.1.3 Rating

2.2 Network centrality and rating/helpfulness

2.3 Research questions and hypotheses

H1

H2

H3

3 Methodology

3.1 Measures of centrality in social networks

3.2 Eigenvector centrality and betweenness centrality

3.3 Bipartite networks

Definition 1

3.3.1 Projection of a bipartite network into a one–mode network

Definition 2

3.4 Modeling and analysis

3.4.1 Centrality measures

3.4.2 Cluster analysis

3.4.3 Regression trees and random forests

3.5 Data set

4 Results

4.1 Network and centrality analysis

4.2 Cluster analysis

4.3 Testing research hypotheses

4.3.1 Relationship between helpfulness, rating and centrality

4.3.2 Using regression trees and random forests to assess the impact of the centrality measures on the rating and helpfulness.

4.3.2.1 Rating

4.3.2.2 Helpfulness

4.3.3 Hypotheses outcomes

H1

H2

H3

5 Discussion and conclusions

5.1 Theoretical contributions

5.2 Practical contributions

5.3 Limitations and future research

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation