Retail supply chain analysts may want to consider Twitter traffic by geography to be one more signal to improve their demand-based decisions when distributing goods from distribution centers to stores.
On Wednesday, March 11, 2020 the World Health Organization (WHO) declared the 2019 novel coronavirus and the disease it causes, COVID-19, to be a global pandemic. On Friday, March 13, US President Trump declared a national emergency.
In order to assess how recent COVID-19 news are affecting the retail sector (with emphasis on the grocery segment), the Appriss Retail Data Science team is using data from the Johns Hopkins CSSE COVID-19 repository, the CDC, the US Census Bureau (for state population estimates), Twitter data, and the aggregated and de-identified Appriss global retail transaction database.
Illustrated in the following is the current strong relationship between daily grocery sales volume and daily tweet volume. This implies that social media has a strong influence on sales of essential items in times of national emergency. We intend to analyze and publish multiple articles as the coronavirus pandemic evolves, with this piece being the first.
Part 1: COVID-19 Statistics
The first US confirmed case of COVID-19 was on January 22, 2020. According to CDC data, the case related to a patient traveling to Washington state from Wuhan, China. The patient first presented with symptoms on January 14.
On February 29, the CDC reported the first COVID-19 related death. By that date, the CDC shows 3,975 tests had been conducted, with the majority (82.77%) having been administered by the CDC. As of March 13, the last day public health laboratories using the CDC assay were required to submit samples to the CDC for confirmation, that percentage had reversed: Of the 34,856 tested, 87.09% had now been administered by public health laboratories.
Somewhat reassuringly, when comparing the number of COVID-19 related deaths in the Johns Hopkins dataset to the number of confirmed cases, the mortality rate appears to be declining, down to 1.28% by March 20.
As of March 17, Washington and New York led the country with the most confirmed cases per 500K population followed by Louisiana, Massachusetts, and Washington, DC. The five least impacted states were Kentucky, Arizona, Idaho, Missouri, and West Virginia.
Part 2: COVID-19, Twitter, and Sales
Using Twitter data between March 4 and March 17 based on COVID-19 related hashtags, we identified an upward swing in tweet volume beginning on March 9, reaching a peak on March 13 (the date of the US national emergency declaration). The tweet activity tapered the following day but has remained high; as of March 18, it was 302% higher than March 10, the day prior to the WHO global pandemic declaration.
While COVID-19 tweet volume declined somewhat following March 13, social media anecdotes of grocery store product outages for essential items proliferated. On March 17, Amazon confirmed stressors in the supply chain when it announced that they were suspending shipments of non-essential items into its warehouses, stating: “We are seeing an increase in online shopping and as a result some products such as household staples and medical supplies are out of stock. With this in mind, we are temporarily prioritizing household staples, medical supplies, and other high-demand products coming into our fulfillment centers so we can more quickly receive, restock, and ship these products to customers.”
Appriss Retail has more than 300 large, global retail clients, including many in the grocery, chain drug, and convenience sectors. Using de-identified metadata, we analyzed the sales transaction volume in this segment of retailers in the US from the period of March 4 through March 14 (the last aggregated data available at the time of this writing).
Much like the Twitter data, grocery data showed a peak in sales transaction counts on Friday, March 13 with an overall growth rate of 32.40%.
For each US state, we compared March 13 sales data to that of the prior Friday, March 6, and found that all states saw a rise in the number of sales processed, with the minimum increase being 14.2%. The median increase was 33.72%.
Interestingly, while Washington had the highest per 500K population cases, it also had the smallest rate of growth in sales volume (14.2%). The states with the highest rates of increase in sales, ranging from 43.9% to 58.3%, were Utah, Delaware, Louisiana, New Jersey, and Texas. The states with the lowest rates of growth, after Washington, were Alaska, Hawaii, Washington DC, and Iowa, with increases ranging from 14.9% to 18.9%.
Lastly, we examined the relationship between the number of confirmed cases per 500K population with grocery sales growth, using March 13 data from each state. We found an R-squared value of 0.01, essentially indicating no correlation between cases and sales (R-squared measures the strength of correlation between two variables on a scale from 0 to 1).
When we examined the relationship between grocery sales volume each day and tweet volume that day, a much stronger relationship was observed – an R-squared value of 0.92.
While confirmed cases and number of deaths continue to climb, tweet and grocery sales have tapered slightly from their peak on the 13th. As of this writing, it remains to be seen how much of the grocery sales decline is a result of out-of-stocks vs. an actual decrease in demand.
For the purposes of this analysis, we did not examine the impacts on varying purchasing channels (BOPIS, e-commerce) nor specifics relating to merchandise areas affected. Analysis also did not examine the effect (if any) of COVID-19 on return behaviors.
For this analysis, big box grocery sales were not included, due to the risk of conflation with non-grocery sales. We may address these aspects in the future as the evolving situation warrants.
As new information becomes available, we will continue to monitor and report on the trends examined above.
We’d like to thank all of our retail clients for their safety efforts during this pandemic. Our objective is to use both de-identified metadata from our own clients and data from reputable sources to present a factual representation of this once-in-a-lifetime event.
Renee DeWolf and Dr. Adi Raz, Appriss Retail
Renee DeWolf is the director, data sciences and modeling at Appriss Retail and an adjunct instructor of Data Science, Cybersecurity and Business at Utica College in New York where she also earned her MBA in Economic Crime and her MS in Data Science/Cybersecurity. She holds several industry certifications including Certified Fraud Examiner. In her free time, Renee serves as a board member of the National Association of Drug Diversion Investigators (NADDI.org) and is the resident data scientist at RI Rank. Renee is currently a doctoral student in the Information Sciences program at UALR.