Extracting Stock Price Data from Yahoo via R

Arguably one of Yahoo's most valuable assets, aside from its stake in BABA, was it's finance website. Few sites, including Google, have such readily (freely) accessible and easily digestible financial information on US equities than Yahoo does. Of course, for the financially sophisticated there are far superior sites that have advanced charting platforms and useful visualizations (TradingView and FinViz come to mind), however, when you need a quick figure on the most recent quarterly ROE for BAC, consistently one of the top sites from a Google search is finance.yahoo.com.

In this post we leverage some code from Josh Ulrich and the use the quantmod package to pull all stock price from Yahoo's servers (AMEX, NASDAQ, NYSE) to answer a question I had on price action following extreme returns.

Specifically, I'm interested to know whether extreme positive returns (arithmetic) at day t are generally followed by negative returns on day t+1. This is driven by anecdotal evidence and my belief that markets exhibit an element of mean reversion.

There are 6712 ticker symbols across all three US equity exchanges that we were able to pull, which included open, high, low, close, and volume data since 2000-01-01. For this exercise, close prices will be used. I decided to filter out stocks with a mean price outside [5,500] USD, and days where the percent of missing price data was greater than 75%.

hist

As shown above, there is a clear relationship between the returns at t+1 for stocks that achieved a maximum return at time t.

snip_20170114233935

The mean daily max return is 223%, and the mean return of the following day for the corresponding stock is 1.5%. Of course, given that this is just price data, we haven't factored in the reason for the high returns. For example, mergers have the ability to provide some certainly in price levels being maintained, whereas stocks that experience high returns from 'fake news' would likely not sustain those levels. It would be interesting to extend this analysis along more dimensions to identify strategies for trading.


###########################: PACKAGES

library(quantmod)
library(Matrix)
library(ggplot2)

##########################: GLOBAL VARIABLES

dir_in <- "C:\\Users\\dmanuge\\Documents\\stocks"
dir_out <- "C:\\Users\\dmanuge\\Documents\\stocks"
start_date <- "2000-01-01"
source_data <- "yahoo"
min_price <- 5
max_price <- 500
max_missing <- 0.75

##########################: MAIN

#get ticker list
symbols <- stockSymbols()

#initialize datasets
open_price <- xts()
high_price <- xts()
low_price <- xts()
close_price <- xts()
volume <- xts()
adjusted_price <- xts()

#get stock data for every ticker
for(i in 1:nrow(symbols)){
  
  symbol <- symbols[i,1]
  tryit <- try(getSymbols(symbol,from=start_date, src=source_data))
  if(inherits(tryit, "try-error")){ #skip
    next
  } else { #read, assign, and append stock data to datasets
    getSymbols(symbol,from=start_date, src=source_data)
    open_price <- merge(open_price, Op(get(symbol)))
    high_price <- merge(high_price, Hi(get(symbol)))
    low_price <- merge(low_price, Lo(get(symbol)))
    close_price <- merge(close_price, Cl(get(symbol)))
    volume <- merge(volume, Vo(get(symbol)))
    adjusted_price <- merge(adjusted_price, Ad(get(symbol)))
  }
  rm(list=symbol) 
  cat("Stock: #", i, ", ", floor(100*i/nrow(symbols)),"% \n", sep="")
  
}

#create arithmetic returns data
open_price_delta <- apply(open_price,2,Delt)
close_price_delta <- apply(close_price,2,Delt)
high_price_delta <- apply(high_price,2,Delt)
low_price_delta <- apply(low_price,2,Delt)
volume_delta <- apply(volume,2,Delt)
adjusted_price_delta <- apply(adjusted_price,2,Delt)

#filtering stocks
is_greater_than_min_price <- as.numeric(colMeans(close_price, na.rm = T)>min_price)
is_less_than_max_price <- as.numeric(colMeans(close_price, na.rm = T)<max_price)
candidate_dates <- apply(close_price_delta[,paste0(candidate_stocks,".Close")], 1, function(x){sum(is.na(x))/length(x)})<max_missing
candidate_stocks <- gsub( ".Close.*$", "", names(close_price)[as.logical(is_greater_than_min_price*is_less_than_max_price)])

#creation of table
max_return <- apply(close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")], 1, function(x){max(x, na.rm=T)})
max_stock <- apply(close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")], 1, function(x){which.max(x)})
dates <- index(close_price[candidate_dates,])[-c(length(index(close_price[candidate_dates,])))]
tickers <- candidate_stocks[max_stock[-c(length(max_stock))]]
max_return_t <- max_return[-c(length(max_return))]
next_return_t_1 <- close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")][cbind(2:nrow(close_price_delta[candidate_dates,]),max_stock[-length(max_stock)])]
final_table <- data.frame(dates, tickers, max_return_t, next_return_t_1)

#stats
summary(final_table)

#plot
hist_data1 <- data.frame(val=final_table$max_return_t,id="max_return_t")
hist_data2 <- data.frame(val=final_table$next_return_t_1,id="next_return_t_1")
hist_data <- rbind(hist_data1, hist_data2)
hist_data_in <- hist_data[!hist_data$val %in% boxplot.stats(hist_data$val)$out,]
ggplot(hist_data_in, aes(val, fill = id)) + geom_density(alpha = 0.2)