writing
Field Notes

In-Depth Product-Level Analysis Using R for Advanced Market Insights

Performing a detailed product-level analysis requires careful data manipulation, precise categorization, and insightful text analytics. In this blog post, we'll explore a structured, privacy-aware approach to analyzing product advertising data using R and essential packages such as tidyverse, lubridate, and tidytext.

Preparing the Analysis Environment

We start by loading the necessary libraries to ensure an efficient and powerful data analysis environment:

pacman::p_load(tidyverse, janitor, lubridate, anytime, RClickhouse, DBI, RMySQL, tidytext)

These packages collectively offer extensive capabilities for data import, manipulation, cleaning, text analytics, and database connectivity.

Data Retrieval from Databases

Our analysis involves fetching specific product-related data using targeted SQL queries directly from databases, such as ClickHouse:

wayfair_pla <- DBI::dbGetQuery(conn,"
  SELECT dt, id, title, brand, sale_price, product_category
  FROM feed_pla_master
  WHERE lower(advertiser_name) LIKE 'wayfair%'
    AND dt BETWEEN toDate('2022-01-01') AND toDate('2023-08-29')
    AND lower(product_category) LIKE '%furniture > chair%'
    AND availability NOT LIKE 'out of stock'
    AND title LIKE '%Wingback Chair - Winston Porter Abem 23.03%'
") %>% as_tibble() %>% mutate_if(bit64::is.integer64, as.numeric)

This ensures precise data retrieval, specifically targeting relevant products and timeframes.

Comprehensive Data Cleaning

Subsequent steps involve thorough cleaning, including normalizing product identifiers and categories:

ch_wayfair_cleansed <- ch_wayfair %>% mutate(id = gsub('^0+{3}', '', id))

Normalization facilitates effective joins and comparisons between datasets from different sources.

Advanced Data Integration

We integrate external advertising performance data by performing detailed join operations:

joined_macys_bing_data <- bing_macys_data %>%
  left_join(ch_macy_cleansed, by = c('merchant_product_id' = 'id', 'title' = 'title')) %>%
  distinct()

This approach helps in evaluating product visibility and advertising performance comprehensively.

Product Category Analysis

We perform granular category analyses to identify characteristics of descriptive product titles across main, middle, and lower categories:

main_category_descriptive_titles <- joined_macys_bing_data %>%
  separate(product_category, into = c('main_category', 'middle_category', 'lower_category'), sep = ' > ') %>%
  drop_na() %>%
  group_by(main_category) %>%
  summarise(
    count = n(),
    avg_title_length = mean(nchar(title)),
    avg_word_count = mean(str_count(title, "\\S+"))
  ) %>%
  arrange(desc(avg_title_length))

Such detailed analyses provide actionable insights into product marketing and content strategy.

Identifying Gaps in Advertising

To reveal potential gaps in advertising coverage, we systematically analyze products without recorded advertising impressions:

missing_macys_bing_data <- ch_macy_cleansed %>%
  left_join(bing_macys_data, by = c('id' = 'merchant_product_id', 'title' = 'title')) %>%
  distinct() %>%
  filter(is.na(account_name))

This highlights products that might need enhanced advertising attention.

Text Analytics for Keyword Optimization

Utilizing the power of tidytext, we conduct n-gram analyses to identify the most impactful keywords in product titles:

ngram_titles <- unique_titles %>%
  unnest_ngrams(ngram, title, n = 2)

top_ngrams <- ngram_titles %>%
  count(main_category, ngram, sort = TRUE) %>%
  group_by(main_category) %>%
  slice_max(n, n = 25)

Such keyword analyses inform strategic decisions about product naming and marketing language.

Analyzing Cost-Per-Click (CPC)

A comprehensive CPC analysis segmented by product pricing provides crucial insights into advertising efficiency:

summary_df <- joined_macys_bing_data %>%
  mutate(Price_Bucket = case_when(
    sale_price < 25 ~ "Under $25",
    sale_price >= 25 & sale_price < 50 ~ "$25 - $50",
    sale_price >= 50 & sale_price < 100 ~ "$50 - $100",
    sale_price >= 100 & sale_price < 250 ~ "$100 - $250",
    sale_price >= 250 ~ "Above $250"
  )) %>%
  group_by(Price_Bucket) %>%
  summarise(Avg_CPC = mean(avg_cpc, na.rm = TRUE), Count = n()) %>%
  drop_na()

Understanding CPC trends across price buckets allows strategic allocation of advertising budgets.

Visualization for Enhanced Insights

We leverage powerful visualizations, such as bar plots, to clearly communicate insights and facilitate decision-making:

ggplot(summary_df, aes(x = Price_Bucket, y = Avg_CPC)) +
  geom_bar(stat = "identity", aes(fill = Price_Bucket)) +
  geom_text(aes(label = paste("Count: ", Count)), vjust = -0.5) +
  ggtitle("Average CPC by Product Price Bucket") +
  xlab("Product Price Bucket") +
  ylab("Average CPC")

Final Thoughts

This detailed, privacy-conscious workflow demonstrates the powerful capabilities of R for conducting sophisticated market analyses. Leveraging robust text analytics, precise database querying, and insightful visualization, we provide clear, actionable insights essential for strategic business decisions.