--- title: "Goodreader Quick Start Guide" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Goodreader Quick Start Guide} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, out.width='\\textwidth', fig.height = 4, fig.width = 5, fig.align='center' ) ``` ## Installing and loading the package Install the package: ```{r, eval = FALSE} install.packages("Goodreader") ``` And load the package: ```{r} library(Goodreader) ``` ## Searching for Books on Goodreads The `search_goodreads()` function allows you to search for books on Goodreads based on various criteria. The code below searches for books that include the term “parenting” in the title and returned 10 books sorted by readers’ ratings ```{r, eval = FALSE} parent_df <- search_goodreads(search_term = "parenting", search_in = "title", num_books = 10, sort_by = "ratings") ``` ```{r, eval = FALSE} summary(parent_df) ## title author book_id ## Length:10 Length:10 Length:10 ## Class :character Class :character Class :character ## Mode :character Mode :character Mode :character ## url ratings ## Length:10 Min. : 8427 ## Class :character 1st Qu.:11744 ## Mode :character Median :13662 ## Mean :19757 ## 3rd Qu.:13784 ## Max. :69591 ``` You can also search author's name: ```{r, eval = FALSE} search_goodreads(search_term = "J.K. Rowling", search_in = "author", num_books = 5, sort_by = "ratings") ``` The `search_goodreads()` function includes a `sort_by` that sorts the results either by `ratings` or `published_year`: ```{r, eval = FALSE} search_goodreads(search_term = "J.K. Rowling", search_in = "author", num_books = 5, sort_by = "published_year") ``` ## Scrape book metadata and reviews After the books are found, save their IDs to a text file. These IDs are used for extracting book metadata and reviews: ```{r, eval = FALSE} get_book_ids(input_data = parent_df, file_name = "parent_books.txt") #the book IDs are now stored in a text file named “parent_books” ``` Book metadata can then be scraped: ```{r, eval = FALSE} parent_bookinfo <- scrape_books(book_ids_path = "parent_books.txt", use_parallel = FALSE) ``` To speed up the scraping process: *Turn on the parallel process: `use_parallel = TRUE` *Specify the number of cores for the parallel process (e.g., `num_cores = 8) ```{r, eval = FALSE} parent_bookreviews <- scrape_reviews(book_ids_path = "parent_books.txt", num_reviews = 10, use_parallel = FALSE) #users can also turn on parallel process to speed up the process ``` ## Conduct sentiment analysis The `analyze_sentiment()` function calculates the sentiment score of each review based on the lexicon chosen by the user. Available options for lexicon are `afinn`, `bing`, and `nrc`. Basic negation scope detection was implemented (e.g., not happy is labeled as negative emotion and is assigned with a negative score). ```{r, eval = FALSE} sentiment_results <- analyze_sentiment(parent_bookreviews, lexicon = "afinn") ``` The `average_book_sentiment()` function calculates the average sentiment score for each book. ```{r, eval = FALSE} ave_sentiment <- average_book_sentiment(sentiment_results) summary(ave_sentiment) ## book_id avg_sentiment ## Length:10 Min. : 4.40 ## Class :character 1st Qu.: 7.25 ## Mode :character Median :12.86 ## Mean :12.95 ## 3rd Qu.:14.65 ## Max. :27.30 ``` The sentiment scores can be plotted as a histogram: ```{r, eval=FALSE} sentiment_histogram(sentiment_results) ``` ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/sentiment_hist.png') ``` Or a trend of average sentiment score over time: ```{r, eval=FALSE} sentiment_trend(sentiment_results, time_period = "year") ``` ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/sentiment_trend.png') ``` ## Perform topic modeling Apply topic modeling to the reviews data: ```{r, eval = FALSE} reviews_topic <- model_topics(parent_bookreviews, num_topics = 3, num_terms = 10, english_only = TRUE) ## Topic 1: ## parent, children, need, one, way, good, get, work, dont, give ## ## Topic 2: ## parent, child, book, emot, feel, help, also, can, children, use ## ## Topic 3: ## book, just, kid, think, read, like, time, say, realli, much ``` Plot the top terms by topic: ```{r, eval=FALSE} plot_topic_terms(reviews_topic) ``` ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/topic_terms.png') ``` Create a word cloud for each topic: ```{r, eval=FALSE} gen_topic_clouds(reviews_topic) ``` Topic 1: ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/Topic1.png') ``` Topic 2: ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/Topic2.png') ``` Topic 3: ```{r echo=FALSE, out.width='400px'} knitr::include_graphics('../man/figures/Topic3.png') ``` ## Other utility functions The following table shows other utility functions to extract book-related information ```{r echo=FALSE} library(dplyr) data.frame(Function = c("get_book_ids()", "get_book_summary()", "get_author_info()", "get_genres()", "get_published_time()", "get_num_pages()", "get_format_info()", "get_rating_distribution()"), Output = c("Text file", "List", "List", "List", "List", "List", "List", "List"), Description = c("Retrieve the book IDs from the input data and save to a text file ", "Retrieve the summary for each book", "Retrieve the author information for each book", "Extract the genres for each book", "Retrieve the published time for each book", "Retrieve the number of pages for each book", "Retrieve the format information for each book", "Retrieve the rating distribution for each book")) %>% knitr::kable() ```