# Extracting Stock Price Data from Yahoo via R

Arguably one of Yahoo's most valuable assets, aside from its stake in BABA, was it's finance website. Few sites, including Google, have such readily (freely) accessible and easily digestible financial information on US equities than Yahoo does. Of course, for the financially sophisticated there are far superior sites that have advanced charting platforms and useful visualizations (TradingView and FinViz come to mind), however, when you need a quick figure on the most recent quarterly ROE for BAC, consistently one of the top sites from a Google search is finance.yahoo.com.

In this post we leverage some code from Josh Ulrich and the use the quantmod package to pull all stock price from Yahoo's servers (AMEX, NASDAQ, NYSE) to answer a question I had on price action following extreme returns.

Specifically, I'm interested to know whether extreme positive returns (arithmetic) at day t are generally followed by negative returns on day t+1. This is driven by anecdotal evidence and my belief that markets exhibit an element of mean reversion.

There are 6712 ticker symbols across all three US equity exchanges that we were able to pull, which included open, high, low, close, and volume data since 2000-01-01. For this exercise, close prices will be used. I decided to filter out stocks with a mean price outside USD, and days where the percent of missing price data was greater than 75%.

As shown above, there is a clear relationship between the returns at t+1 for stocks that achieved a maximum return at time t.

The mean daily max return is 223%, and the mean return of the following day for the corresponding stock is 1.5%. Of course, given that this is just price data, we haven't factored in the reason for the high returns. For example, mergers have the ability to provide some certainly in price levels being maintained, whereas stocks that experience high returns from 'fake news' would likely not sustain those levels. It would be interesting to extend this analysis along more dimensions to identify strategies for trading.


###########################: PACKAGES

library(quantmod)
library(Matrix)
library(ggplot2)

##########################: GLOBAL VARIABLES

dir_in <- "C:\\Users\\dmanuge\\Documents\\stocks"
dir_out <- "C:\\Users\\dmanuge\\Documents\\stocks"
start_date <- "2000-01-01"
source_data <- "yahoo"
min_price <- 5
max_price <- 500
max_missing <- 0.75

##########################: MAIN

#get ticker list
symbols <- stockSymbols()

#initialize datasets
open_price <- xts()
high_price <- xts()
low_price <- xts()
close_price <- xts()
volume <- xts()

#get stock data for every ticker
for(i in 1:nrow(symbols)){

symbol <- symbols[i,1]
tryit <- try(getSymbols(symbol,from=start_date, src=source_data))
if(inherits(tryit, "try-error")){ #skip
next
} else { #read, assign, and append stock data to datasets
getSymbols(symbol,from=start_date, src=source_data)
open_price <- merge(open_price, Op(get(symbol)))
high_price <- merge(high_price, Hi(get(symbol)))
low_price <- merge(low_price, Lo(get(symbol)))
close_price <- merge(close_price, Cl(get(symbol)))
volume <- merge(volume, Vo(get(symbol)))
}
rm(list=symbol)
cat("Stock: #", i, ", ", floor(100*i/nrow(symbols)),"% \n", sep="")

}

#create arithmetic returns data
open_price_delta <- apply(open_price,2,Delt)
close_price_delta <- apply(close_price,2,Delt)
high_price_delta <- apply(high_price,2,Delt)
low_price_delta <- apply(low_price,2,Delt)
volume_delta <- apply(volume,2,Delt)
candidate_stocks <- gsub( ".Close.*$", "", names(close_price)[as.logical(is_greater_than_min_price*is_less_than_max_price)]) #creation of table max_return <- apply(close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")], 1, function(x){max(x, na.rm=T)}) max_stock <- apply(close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")], 1, function(x){which.max(x)}) dates <- index(close_price[candidate_dates,])[-c(length(index(close_price[candidate_dates,])))] tickers <- candidate_stocks[max_stock[-c(length(max_stock))]] max_return_t <- max_return[-c(length(max_return))] next_return_t_1 <- close_price_delta[candidate_dates,paste0(candidate_stocks,".Close")][cbind(2:nrow(close_price_delta[candidate_dates,]),max_stock[-length(max_stock)])] final_table <- data.frame(dates, tickers, max_return_t, next_return_t_1) #stats summary(final_table) #plot hist_data1 <- data.frame(val=final_table$max_return_t,id="max_return_t")
hist_data2 <- data.frame(val=final_table$next_return_t_1,id="next_return_t_1") hist_data <- rbind(hist_data1, hist_data2) hist_data_in <- hist_data[!hist_data$val %in% boxplot.stats(hist_data$val)$out,]