Install the R package.

# install.packages("remotes")
remotes::install_github("news-r/gensimr")

The install the python dependencies:

  • gensim
  • scikit-learn
  • pyLDAvis

Make sure you have a C compiler before installing the dependencies, to use the optimized word2vec routines (70x speedup compared to plain NumPy implementation).

gensimr::install_dependencies()

Ideally one should use a virtual environment and pass it to install_dependencies, only create the environment once.

# replace with path of your choice
my_env <- "./env"

# run this (works on unix)
args <- paste("-m venv", my_env)
system2("python3", args) # create environment
reticulate::use_virtualenv(my_env) # force reticulate to use env
gensimr::install_dependencies(my_env) # install gensim & scikit-learn in environment

In future sessions, simply specify the environment before calling gensimr.

my_env <- "./env"
reticulate::use_virtualenv(my_env)