Wolfram Language and R

Abstract

This webinar will explore how we can use the Wolfram Language and R together in an integrated workflow. The focus will be on

RLink

, a built-in Wolfram Mathematica functionality that allows R code to be executed from the native Wolfram Mathematica environment and for data to be passed seamlessly between the two languages. As well as showcasing the basic high-level functions of RLink, the webinar will explore real-life example workflows which combine both Wolfram Language and R functionality. The webinar would be of particular interest to users of either language at any level of expertise, as RLink would allow them to rely on their knowledge of their preferred language for common operations while still using the advantages of the other. Users experienced in both languages would benefit from using RLink for calling both languages from the same environment.

Plan for the session

We will start with a side-to-side comparison between the Wolfram Language and R. This will lead to a discussion of why you might want to use the two in combination. Then practically, how do things like serialization or data conversion between the two happen. Then we’ll move on to actually setting up the environment for using both languages from the same place, usually a Mathematica notebook. Then we’ll move to some example projects, code snippets, use cases, and so on.

So, roughly speaking, the first half will be more discussion based and the second half of the session will try to be more example-driven.

Comparing Wolfram Language and R

Out[]=

	Wolfram Language	R
Focus	General computation	Data science
Paradigm	Multi-paradigm	Multiparadigm
Standard library	Extensive	Minimal
External resources	Standard	Extensive
Computation	Symbolic and numerical	Numerical
License	Proprietary	GPLv2/v3
Key advantages	SpeedKnowledgebaseLLM integration	FreeExisting codebases

Both WL and R are general purpose programming languages, although R tends to be used mainly for statistics and data science more broadly, and the bulk of the standard library and external libraries available for R reflect that. WL has a much more extensive standard library than R, but in terms of external resources like user-made libraries and codebases, R has the advantage. The key advantages of WL are its integrated knowledgebase, its optimization and speed, and as of recently, seamless integration of LLMs into the Mathematica notebook environment. From R’s side, it’s free, and there is plenty of existing code and broader resources that the community has contributed.

Combining Wolfram Language and R

Using Wolfram Engine from R

One simple way to combine the two languages is to use Wolfram Engine, which has a command line interface which can be used with R’s system() function (or any other environment which has access to a terminal emulator). To evaluate a piece of Wolfram Language code from R, you can use

system("wolframscript -c 2+2")

and to evaluate a specific Wolfram language file, for example, if you have a Wolfram language script that generates something, you can execute it with

system("wolframscript -f myfile.wl")

There are some obvious limitations with that approach; it makes it difficult to pass data between the two languages, to see interactive outputs, and to build programs iteratively. It is mainly useful if you already have a WL script that does something specific, and you want to execute that script from R’s environment.

RLink

REvaluate/External language cell

Before we jump into the some example workflows, let’s briefly look at how the integration of R and WL works under the hood. It all happens via RLink, which is a Wolfram System application that uses JLink and RJava/JRI Java libraries to link to the R functionality. It allows the user to communicate data between the Wolfram Language and R and execute R code from within the Wolfram Language. Technically, this happens through the already existing integration between Java and R, so it is a considerable software stack that’s built to allow communication between WL and R.

If you already have R installed on your computer, you should be able to use RLink without any further setup necessary. You can test by evaluating an external language cell and specifying R as the external language.

〉

In[]:=

print("Hello R")

Out[]=

{Hello R}

If you prefer, you can manually configure the R runtime that Wolfram Mathematica uses, which is useful in cases where you’d want to use a different version than your default one. The process of doing that is a bit more involved, and since most users will probably prefer to use their existing R installations, I am not going to cover the manual setup in great detail, but I will leave a link to the documentation in the notebook which you will have access to.

The R external language cell is the simplest way to use R from Wolfram Mathematica—it simply evaluates any arbitrary string of valid R code. You can use it just like an R terminal. You can also include multiline code too if you wrap it in curly brackets.

〉

In[]:=

{
simpleVar <- c(10,20,50)
simpleVarSq <- simpleVar^2
simpleVarSq
}

Out[]=

{100.,400.,2500.}

Evaluating R code with an external language cell set to R is equivalent to using

REvaluate

in standard Wolfram Language input. An important thing to note is that objects defined in your R workspace persist for the session, and you can reuse them from any place.

In[]:=

REvaluate["simpleVarSq"]

Out[]=

{100.,400.,2500.}

One advantage of using

REvaluate

over the external language cell is that you can store the output of your R code into a WL variable.

In[]:=

SimpleVarSq=REvaluate["simpleVarSq"]

Out[]=

{100.,400.,2500.}

Which is then stored in memory and can be used without any reference to R itself.

In[]:=

SimpleVarSqrt=

SimpleVarSq

Out[]=

{10.,20.,50.}

One example where you might want to use an external language cell as opposed to

REvaluate

directly is when you need to escape characters.

〉

In[]:=

Sys.setenv(PATH="C:/Program Files/R/R-4.3.2/bin/x64")

Out[]=

{True}

The equivalent of the above in

REvaluate

syntax is:

In[]:=

REvaluate["Sys.setenv(PATH=\"C:/Program Files/R/R-4.3.2/bin/x64\")"]

Out[]=

{True}

If you are just copy and pasting something into

REvaluate

, Mathematica will offer to escape the characters upon pasting, but if you are writing the code by hand it can get tedious very quickly. Aside from

REvaluate

and the R external language cell, there are two other high-level functions of RLink that you will find yourself using in most workflows.

RSet


RFunction


Example workflows

Example workflow 1: Making use of a specific R function/library for your WL data

R, like Wolfram Mathematica, has a very vibrant and helpful online community. Sometimes, when you are exploring ways to do a specific task, you will find some R code already written up by the community. Normally, if you are comfortable with the Wolfram Language, you can always just rewrite the functionality itself in WL (if it doesn’t already exist as a built-in function!). But a quicker solution in many cases might be to just store the R function that you have found in a WL variable, and use it with your WL data.

As a quick example, consider this function I found online which takes in a list of binomial data and generates an overlaid plot of how the underlying proportion of success changes as each data point is added in, based on Bayesian probability calculations.

First, install the packages if you don’t already have them.

In[]:=

RInstallPackage["ggridges"]

Out[]=

Success

✓

Message: The package ggridges has been successfully installed

RHomeLocation: /opt/homebrew/lib/R



In[]:=

RInstallPackage["cpp11","Repositories"->{"https://cran.uni-muenster.de"},"DefaultRepository"->None,"Reinstall"->True]

Out[]=

Success

✓

Message: The package cpp11 has been successfully installed

RHomeLocation: /opt/homebrew/lib/R



In[]:=

RInstallPackage["tidyverse"]

RInstallPackage

::fail

:Failed to install the R package tidyverse for R installation located at /opt/homebrew/lib/R

Out[]=

Failure



Message:

Failed to install the R package tidyverse for R installation located at /opt/homebrew/lib/R

Tag:

RInstallPackageError



Then import the libraries you’ll need.

〉

In[]:=

{
library(stats)
library(ggplot2)
library(ggridges)
}

Out[]=

{ggridges,ggplot2,stats,graphics,grDevices,utils,datasets,methods,base}

Then, simply copy and paste the R code you need and pass it to RFunction. Store that assignment in a variable.

PlotBinomial=RFunction["function(data = c(), prior_prop = c(1, 1), n_draws = 10000) { library(tidyverse) data <- as.logical(data) data_indices <- round(seq(0, length(data), length.out = min(length(data) + 1, 20))) proportion_success <- c(0, seq(0, 1, length.out = 100), 1) dens_curves <- map_dfr(data_indices, function(i) { value <- ifelse(i == 0, \"Prior\", ifelse(data[i], \"Success\", \"Failure\"))label <- paste0(\"n=\", i)probability <- dbeta(proportion_success, prior_prop[1] + sum(data[seq_len(i)]), prior_prop[2] + sum(!data[seq_len(i)]))probability <- probability / max(probability)data_frame (value, label, proportion_success, probability) }) # Turning label and value into factors with the right ordering for the plot dens_curves$label <- fct_rev(factor(dens_curves$label, levels = paste0(\"n=\", data_indices ))) dens_curves$value <- factor(dens_curves$value, levels = c(\"Prior\", \"Success\", \"Failure\")) p <- ggplot(dens_curves, aes(x = proportion_success, y = label, height = probability, fill = value)) + ggridges::geom_density_ridges(stat=\"identity\", color = \"white\", alpha = 0.8, panel_scaling = TRUE, size = 1) + scale_y_discrete(\"\", expand = c(0.01, 0)) + scale_x_continuous(\"Underlying proportion of success\") + scale_fill_manual(values = hcl(120 * 2:0 + 15, 100, 65), name = \"\", drop = FALSE, labels = c(\"Prior \", \"Success \", \"Failure \")) + ggtitle(paste0( \"Binomial model - Data: \", sum(data), \" successes, \" , sum(!data), \" failures\")) + theme_light() + theme(legend.position = \"top\") ggsave(\"C:/Users/dimitark/projects/wl-r/bayesian/plot.png\", plot=p,dpi=300, width=2000, height = 2000, units=\"px\")}"];

Finally, wrap this up in a WL function that applies the R function to an argument and then imports the resulting ggplot2 plot from disk.

In[]:=

GetBinomialPlot[x_]:=(PlotBinomial[x];Import["C:\\Users\\dimitark\\projects\\wl-r\\bayesian\\plot.png"])

Now you can pass any binomial WL data to your function, and you will get an R ggplot directly in Mathematica.

You can do this with pretty much any R code you find online, even if it involves passing data types that are not directly supported by RLink. In this example, we did not get the resulting ggplot2 plot directly via RLink, because it is a custom object that is not (automatically) supported, but you can export it to a .png, and then load that .png directly into Mathematica.

Example workflow 2: Making use of a specific WL function for your R data

〉

{
library(tsdl)
Sys.setenv(PATH="C:/Program Files/R/R-4.3.2/bin/x64")
roberts_1992 <- tsdl[170]
years <- seq(from = start(roberts_1992[[1]])[1], to = end(roberts_1992[[1]])[1], by = 1)
for_export <- cbind(years, unlist(roberts_1992))
};

Example workflow 3: Prepare data in R and use WL for analysis

A common reason why you might want to use RLink to pass data from R to WL is speed. While R is a good language in many aspects, speed has never been one of its strong points. This is especially true if you are working with any form of big data, where the difference in speed between R and WL becomes very obvious. Let’s take a simple text analytics example to illustrate a common workflow and to compare speeds. Use the gutenbergr R library to grab a text from the Project Gutenberg and store it in an R variable. ID 86 is Mark Twain’s A Connecticut Yankee in King Arthur’s Court.

〉

In[]:=

{
Sys.setenv(PATH="C:/Program Files/R/R-4.3.2/bin/x64")
library(gutenbergr)
book <- gutenberg_download(86)
};

Get the book’s text and store it in a WL variable.

With that, let’s perform something that requires a reasonable amount of computation. WordCloud needs to tokenize the string that contains the whole novel, remove stopwords, count frequencies, and produce a graph.

To check the time it takes WL to produce the WordCloud, you can use AbsoluteTiming.

〉

In[]:=

{
library(wordcloud)
png("C:/Users/dimitark/projects/wl-r/WordCloud/twain_word_cloud.png", width = 1600, height = 1600, res = 300)
time_to_plot <- system.time(wordcloud(book$text))
dev.off()
time_to_plot[3]
}

Obviously this use case is situational. If you are only dealing with small data frames and non-intensive computation, you might want to take the speed tradeoff and just stay in R. While we are discussing speed of computation however, it is worth mentioning a few general points about optimizing RLink for speed. Generally speaking, the main operation that incurs a significant overhead is passing data between WL and R. The bigger the data, the more time it takes to be serialized from one language to the other. However, if you have data in one workspace, and you do operations on it in that workspace alone, the overhead is minimal. This example of producing the word cloud in R was a good one since the novel was already in R workspace, we did not need to move it from WL. Doing so would have been even more time consuming. But if speed is a major concern, a general rule of thumb would be to minimize the data transfer between the two languages as much as possible.

Example workflow 4: Supplement R data with Wolfram Knowledgebase information

Get the file

Start by importing our file in Mathematica, which is a CSV file of publicly available data on the coffee exports of different countries from 1990 to 2018. The file will be provided with the webinar’s notebook so you could go through the steps yourself.

Let’s see what we are dealing with. Take the first 10 rows and display them as a table.

Some things could be fixed. The heading of the first column is wrong, we will eventually want to have an average or a total for each country’s exports, but on the whole, it’s something we can work with. What we will do is construct a model that will predict the coffee exports of a country. To do that, we want some factors that we think are likely to predict that. To keep things simple, let’s get a few of the most relevant ones only—we’ll take the climate types that each country has as well as its population. For this first part of our workflow, we will be using mainly Mathematica because of its extensive knowledgebase, but we will be jumping to R for the occasional convenient function.

Grab data from the Wolfram Knowledgebase

It’s reasonable to assume that the higher the population of a country is, the more they’ll be able to produce in total, and the more they’ll be able to export. But the dataset we have doesn’t include the population of the country, and searching online and adding it manually will be very time-consuming and prone to error. Luckily, the Wolfram Knowledgebase has extensive information about many entities, including countries, so we could use that to create a new column in our dataset that contains the population. Looking at the table above, some country names have extra strings in parentheses that are not needed, so some cleaning up might be wise. Very easy to do with R’s powerful gsub() function which allows regular expression string matching.

Now apply it to the first column of our dataset (and start from the second row since the first one is a header which is incorrect for now anyway).

Now that we have a list of nicely formatted names, we can create a pure function that that will take an argument, try to interpret it as a country, and get the population of that country.

Add a meaningful header to the populations list.

Now map the list back to the original dataset as an appended column.

Check if it looks okay, take the first 10 rows of the first and last column.

Now we already have the country names stored in a variable, we can repeat a similar process with the climate type data for each country and add it back to the dataset, after which we’ll be ready to work with all of that in R. Create a pure function that will get a list of all the climate types of a country that is passed in to it.

The way we’d want to structure our data is to create a dummy variable for each climate type, and code a country with 1 if it has that climate type, or 0 if it doesn’t. To achieve that, we need to have a heading for each unique climate type. Get all unique climate types in our data.

These climates are Entities in the Wolfram Language that contain a lot of information. For our use case, we just need their names.

Create a function that will check whether a climate is contained in the list of a specific country’s climates.

And apply that function to each country.

This gives us the list of all the climates for each country. Use the climate name as the variable name for each climate.

And add that to the original dataset (with populations)

Before we pass it to R, have one final check that it looks as expected.

Looks good, each climate has its own column and each country that has the climate is coded as 1. The data is in a format that is ready to be analyzed in R. Passing a WL expression, such as our dataWithPopulationsAndClimates variable to an R dataframe is not as straightforward through RLink. It is possible by defining a custom data type in RLink that corresponds to the structure of the data, but there is a much easier way—export the data in a format that both pieces of software recognize easily. For our purposes, this can be a CSV file.

Main data analysis in R

Now, we can import the CSV file into our R environment using RSet without leaving Mathematica.

Unlike with some WL expressions, R knows how to deal with CSV files perfectly well, which means our data is immediately available for analysis. From now on, we can just use REvaluate or R external language cells for any R-side analysis we do. Let’s do our final data clean-ups and preparations in R.

〉

In[]:=

{
coffee_data$average_exports <- rowMeans(coffee_data[, 2 : 30])
colnames(coffee_data)[1] <-"country"
};

Wolfram Mathematica can understand R data frames and display them accordingly. A common operation which you may find yourself using is displaying R data in the form of a table, which is possible by applying TableForm directly to the R data frame object.

With all that done, construct the formula for the linear model and run it, all on R’s side with REvaluate.

〉

In[]:=

{
library(stats)
dummy_variables <- colnames(coffee_data)[32 : 52]
dummy_string <- paste(dummy_variables, collapse = " + ")
formula <- as.formula(paste("average_exports ~ Population + ", dummy_string))
lm_model <- lm(formula, data = coffee_data)
};

With that done, and the linear model stored in the variable “lm_model” in R’s workspace, we can call it directly to Mathematica, but it’s a fairly complicated RObject that is more easily navigated by only extracting the relevant bits from the R side. Let’s get some descriptors of the model.

Finally, let’s create a nice plot of our linear model in R and pass it back to WL. Install ggplot2 and load it up if you have not already done so and create your plot with normal ggplot2 syntax.

〉

In[]:=

{
library(ggplot2)
max_exports <- coffee_data[which.max(coffee_data$average_exports), ]
max_population_row <- coffee_data[which.max(coffee_data$Population), ]

lineOfBestFit <- ggplot(coffee_data, aes(x = Population, y = average_exports)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Linear Regression: Coffee Exports vs Population",
x = "Population",
y = "Average Coffee Exports") +
theme_minimal() +
scale_x_continuous(labels = scales::comma) +
geom_text(data = rbind(max_exports, max_population_row), aes(label=country), vjust = -0.5, hjust = 1)
};

Now lineOfBestFit is a custom ggplot2 object. Like before, just save it to a .png and load it up in Mathematica.

〉

In[]:=

ggsave("line_of_best_fit.png", plot = lineOfBestFit, width = 10, height = 6, units = "in", dpi = 300);

With that, we can see the plot generated by ggplot2 in our Mathematica environment with minimal effort. The general thing to keep in mind here is that while RLink offers a top-level serialization for the most common data types, and an extension in the form of defining your own custom types, sometimes you may be better served by finding an encoding that both pieces of software understand and use that to transfer data, rather than doing it with RLink. Even with that, RLink still has value in allowing you to do all of that from within your Mathematica notebook.

Example workflow 5: Scrape web data with R and construct an interactive web app with WL

Since we are staying in R for the web-scraping part of the example, we can write out our R code in one big external cell, much like we would if we were using RStudio or a similar IDE.

〉

In[]:=

{
library(rvest)
library(dplyr)
library(RSelenium)
library(stats)
}

〉

In[]:=

library(RSelenium)

〉

setwd("/Users/dimitark/projects/Webinars/WLR/Combining Wolfram Language and R/Example5")

〉

# Initialize Selenium and get the webpage
rD <- rsDriver(browser="firefox", port=4549L, verbose=F)
remDr <- rD[["client"]]

〉

remDr$navigate("https://www.atptour.com/en/stats/leaderboard?boardType=serve&timeFrame=52week&surface=all&versusRank=all&formerNo1=false")
Sys.sleep(5)
load_more_button <- remDr$findElement(using = "css selector", "a.atp_button:nth-child(5)")
load_more_button$clickElement()
Sys.sleep(5)
serves_html <- remDr$getPageSource()[[1]]

remDr$navigate("https://www.atptour.com/en/stats/leaderboard?boardType=return&timeFrame=52week&surface=all&versusRank=all&formerNo1=false")
Sys.sleep(5)
load_more_button <- remDr$findElement(using = "css selector", "a.atp_button:nth-child(5)")
load_more_button$clickElement()
Sys.sleep(5)
returns_html <- remDr$getPageSource()[[1]]

remDr$navigate("https://www.atptour.com/en/stats/leaderboard?boardType=pressure&timeFrame=52week&surface=all&versusRank=all&formerNo1=false")
Sys.sleep(5)
load_more_button <- remDr$findElement(using = "css selector", "a.atp_button:nth-child(5)")
load_more_button$clickElement()
Sys.sleep(5)
pressure_html <- remDr$getPageSource()[[1]]

remDr$close()
rD[["server"]]$stop()

〉

# system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
# Save to local files
writeLines(serves_html, "serve_stats.html")
writeLines(returns_html, "returns_stats.html")
writeLines(pressure_html, "pressure_stats.html")

# Process the serve stats page
serves_html <- paste(readLines("serve_stats.html"), collapse = "\n")
serves_page <- read_html(serves_html)

serves_data <- serves_page %>%
html_nodes("table") %>%
html_table()

headshot_urls <- serves_page %>%
html_nodes(".player-image") %>%
html_attr("src")

serves_table <- serves_data[[1]]
serves_table$Image <- na.omit(headshot_urls)

# Process the returns stats page
returns_html <- paste(readLines("returns_stats.html"), collapse = "\n")
returns_page <- read_html(returns_html)

returns_data <- returns_page %>%
html_nodes("table") %>%
html_table()

headshot_urls <- returns_page %>%
html_nodes(".player-image") %>%
html_attr("src")

returns_table <- returns_data[[1]]
returns_table$Image <- na.omit(headshot_urls)

# Process the pressure stats page
pressure_html <- paste(readLines("pressure_stats.html"), collapse = "\n")
pressure_page <- read_html(pressure_html)

pressure_data <- pressure_page %>%
html_nodes("table") %>%
html_table()

headshot_urls <- pressure_page %>%
html_nodes(".player-image") %>%
html_attr("src")

pressure_table <- pressure_data[[1]]
pressure_table$Image <- na.omit(headshot_urls)

full_dataframe <- left_join(serves_table[, -1], returns_table[, -1]) %>%
left_join(., pressure_table[, -1])
write.csv(full_dataframe, "tennis_data.csv", row.names = FALSE)
write.csv(serves_table[,-1], "serve_data.csv", row.names = FALSE)
write.csv(returns_table[,-1], "returns_data.csv", row.names = FALSE)
write.csv(pressure_table[,-1], "pressure_data.csv", row.names = FALSE)
}

Conclusion

Hopefully this gave you an idea of how you can combine your Wolfram Language and R workflows in a single Mathematica notebook. Once you have RLink set up, the main high-level functions are RSet, RFunction, and REvaluate. With that, you can use your R runtime just as you would from RStudio or any other IDE. Passing data back and forth is supported seamlessly for simple objects. More complicated objects may require creating custom type or encoding to a common file format specification.

Wolfram Language and R

Abstract

Plan for the session

Comparing Wolfram Language and R

Combining Wolfram Language and R

Using Wolfram Engine from R

RLink

REvaluate/External language cell

RSet

RFunction

Example workflows

Example workflow 1: Making use of a specific R function/library for your WL data

Example workflow 2: Making use of a specific WL function for your R data

Example workflow 3: Prepare data in R and use WL for analysis

Example workflow 4: Supplement R data with Wolfram Knowledgebase information

Get the file

Grab data from the Wolfram Knowledgebase

Main data analysis in R

Example workflow 5: Scrape web data with R and construct an interactive web app with WL

Conclusion

RSet


RFunction
