Recently, we published a comprehensive analysis on the State of Tezos Staking. Since then, we have received requests to elaborate on how we processed the underlying data. In this article, we provide a step-by-step example on how to download and analyze Tezos on-chain data using the Blockwatch Data API and our open-source R Package. You will need a free API key for that, so if you like to follow along please sign up and request access to our Tezos data here.

Calculating Case Study

One of the factors explored in our study is monthly delegator churn. Churn rate is a common metric that tracks how many customers stop using a product over a specific time period. In our case, churn expresses the rate at which delegators leave their current baker each month. Churn also implicitly tells us how long delegators remain loyal, i.e. how quickly bakers need to find new delegators to stay in business. Churn rate is calculated as

As input we need the count of new and total delegators over time which is not available on the Tezos RPC. Instead, we need to use a historic index like the one created by our Tezos Indexer. Specifically, we will use the SNAPSHOT table which stores information about all bakers and all delegators at each snapshot block, that is, every 256 blocks or approximately every 4.5 hours. With 450 bakers and about 50,000 delegators, this table is extremely large. Since R struggles to handle tabular data with over 1,000,000 rows, we will limit this example to a single baker — Tezos France. You can find other known baker addresses on TzStats or other delegation service listing sites.

How to get the data?

Blockwatch Data serves historic Tezos on-chain data via its API, which is easy to access programmatically or using the Blockwatch R-Package. For full API documentation and a complete list of available fields see here. To access the API you will need a free community license to the XTZ database and an API key that is available after sign up.

The R-Package handles most of the details of building API requests which makes it really easy to work with on-chain data. To load and initialize the package, call

install.packages("devtools")
library(devtools)
install_github("blockwatch-cc/blockwatch-r")
library(blockwatch)
blockwatch.api_key("YOUR_API_KEY")

After setting your API key you can fetch data from individual tables with the blockwatch.table() function. As dataset we use XTZ/SNAPSHOT where XTZ is the database code and SNAPSHOT the actual table name (all uppercase). To filter data by a single baker we set delegate (a synonym for the term baker) to the bakers' public address. In our example, the R call for Tezos France looks as follows:

dta = blockwatch.table(
    "XTZ/SNAPSHOT"
    ,time.lt     = "2020-07-01"
    ,delegate    = "tz1U638Z3xWRiXDDKx2S125MCCaAeGDdA896"
    ,paginate    = TRUE)

Note that in order to filter tables you can specify any table field, like time and delegate in our case. Per default, filters check for equality with a value, but you can append other operators like .lt for less-than comparisons. See the R package documentation for more info on how to build filter conditions.

The API can return up to 50,000 rows a single request. Since this is likely not enough in our example we set paginate to TRUE which instructs the R package to fetch up to the maximum number of rows.

How to calculate delegator churn?

Our downloaded dataset contains three columns we're interested in: the address of each delegator, the time a snapshot was taken, and since_time which is the time a delegator started delegating to our baker. To identify new delegators per month we look at since_time. Since each delegator appears in many rows inside the snapshot table, essentially as long as the delegation lasts, and because sometimes people jump back and forth between bakers we count unique delegator addresses only. Likewise, to count the total number of delegators for each month we look at how many unique addresses appeared in our dataset in a particular month. Sounds easy. Right!? But first, we simplify timestamps by truncating them to the start of each month. The lubridate R package comes handy here:

library(lubridate)
dta$time_rounded   	 <- floor_date(dta$time,"months")
dta$since_time_rounded 	 <- floor_date(dta$since_time,"months")

We append the two new columns to our existing data frame which keeps the relation to the rest of our data. Next we group unique addresses by month, once based on since_time for new delegators and again for total delegators by snapshot time. We use the R function by() in combination with the month as grouping factor to split our dataset into monthly subsets. For each subset we calculate the number of unique addresses and store this result month by month in a new data frame.

# Filter for newly created accounts
monthlynew_raw 	  <- dta[dta$time_rounded == dta$since_time_rounded,]
# Count observations in subsets
monthlynew_packed <- by(monthlynew_raw,
                   	as.factor(monthlynew_raw$time_rounded),
                   	function(x){n = length(unique(x$address))})
# Unpack results of by-function 
monthlynew <- unlist(lapply(monthlynew_packed, unname))

Since by() creates a nested structure we need to explicitly unpack it to have a simple data frame. Calculating the total number of delegators per month happens analog, we only exclude Tezos France' address first (we could also subtract 1 from each months total).

# Excluding baker accounts
total_raw    <- dta[!dta$is_delegate,]
# Count observations in subsets
total_packed  <- by(total_raw,
                     	as.factor(total_raw$time_rounded),
                     	function(x){n = length(unique(x$address))})
# Unpack results of by-function
total <- unlist(lapply(total_packed, unname))

Now with all inputs in place we can calculate our churn rate as

# MoM Change
monthlychange  <- c(NA, diff(total))
# Churn
churn          <- monthlynew - monthlychange
# Churn in percentage
churn_perc     <- 100*churn/total

How to present the results?

Having our results ready, we may as well go the extra mile to display them. A great way to do that is by using the gridExtra package, which converts data frames to PNG images of tables.

library(gridExtra)
grid.table(data.frame(`Tezos France Churn Rate in %` = round(churn_perc[-1], digits = 2),
                  	check.names   	= FALSE),
       	theme = ttheme_minimal()
)

Our result as table:

Tezos France' monthly delegator churn has stayed well below 10% for most of the past months and is at only 1,39% in June 2020. This churn rate is much lower than the network-wide average, indicating delegators are more loyal to Tezos France over long time periods.

Another way of visualizing our results as graph is to render them as time series using ggplot2 and reshape2:

library(reshape2)
library(ggplot2)

df_raw <- data.frame(`Date`    	= as.Date(names(total)),
                 	`Total`   	= total,
                 	`New`     	= monthlynew)
df <- reshape2::melt(df_raw, id.vars = c("Date"))

ggplot(data = df, aes(x = Date, y = value))+
  geom_bar(aes(fill = variable), 
position = "dodge", 
stat="identity")+
	ylab("Number of Accounts")+
	xlab("Date")+
	ggtitle("Tezos France Delegator Statistics")+
	theme(legend.title=element_blank())

The resulting plot reflects that delegator growth for Tezos France slowed down during recent months while overall network delegation skyrocketed as we detailed in our State of Tezos Staking study. This suggests Tezos France was not as successful in attracting new delegators as other bakers were.


Summary

We hope that the above example has inspired you to try our R package in your own research. The Blockwatch Data API offers an easily accessible view into all aspects of Tezos and other popular blockchains. We deliver an unprecedented depth of information, fully searchable history, aggregate metrics, as well as end-of-day statistics.

If you have questions or would like to see more coding examples and other datasets, don’t hesitate to write us at info@blockwatch.cc.