Note that there is no universal way to assess the best number of topics (num_topics) to fit a set of document, see this post.


As stated in table 2 from this paper, this corpus essentially has two classes of documents. First five are about human-computer interaction and the other four are about graphs. Therefore a process to assess the best number of topics to apply to a corpus should return 2.


data("corpus", package = "gensimr")

texts <- prepare_documents(corpus)
#> → Preprocessing 9 documents
#> ← 9 documents after perprocessing
dictionary <- corpora_dictionary(texts)
corpus_bow <- doc2bow(dictionary, texts)

tfidf <- model_tfidf(corpus_bow, id2word = dictionary)
corpus_tfidf <- wrap(tfidf, corpus_bow)