Transformation from bag-of-words counts into a topic space of lower dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA), so LDA’s topics can be interpreted as probability distributions over words. These distributions are, just like with LSA, inferred automatically from a training corpus. Documents are in turn interpreted as a (soft) mixture of these topics (again, just like with LSA).

model_lda(corpus, ...)

# S3 method for mm_file
model_lda(corpus, ...)

# S3 method for mm
model_lda(corpus, ...)

# S3 method for python.builtin.list
model_lda(corpus, ...)

# S3 method for wrapped
model_lda(corpus, ...)

# S3 method for gensim.interfaces.TransformedCorpus
model_lda(corpus, ...)

# S3 method for python.builtin.tuple
model_lda(corpus, ...)

# S3 method for python.builtin.tuple
model_lda(corpus, ...)

load_lda(file)

model_ldamc(corpus, ...)

# S3 method for mm_file
model_ldamc(corpus, ...)

# S3 method for mm
model_ldamc(corpus, ...)

# S3 method for python.builtin.list
model_ldamc(corpus, ...)

load_ldamc(file)

Arguments

corpus

Model as returned by mmcorpus_serialize.

...

Any other options, from the official documentation of model_lda or the official documentation of model_ldamc.

file

Path to a saved model.

Details

Target dimensionality (num_topics) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.

Functions

  • model_lda - Single-core implementation.

  • model_ldamc - Multi-core implementation.

Examples

docs <- prepare_documents(corpus)
#> Preprocessing 9 documents #> 9 documents after perprocessing
dictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE) # fit model lda <- model_lda(corpus_mm, id2word = dictionary, num_topics = 2L) lda_topics <- lda$get_document_topics(corpora) get_docs_topics(lda_topics)
#> # A tibble: 9 x 4 #> dimension_1_x dimension_1_y dimension_2_x dimension_2_y #> <dbl> <dbl> <dbl> <dbl> #> 1 0 0.167 1 0.833 #> 2 0 0.134 1 0.866 #> 3 0 0.123 1 0.877 #> 4 0 0.114 1 0.886 #> 5 0 0.228 1 0.772 #> 6 0 0.305 1 0.695 #> 7 0 0.505 1 0.495 #> 8 0 0.788 1 0.212 #> 9 0 0.853 1 0.147