model_lda.Rd
Transformation from bag-of-words counts into a topic space of lower dimensionality. LDA is a probabilistic extension of LSA (also called multinomial PCA), so LDA’s topics can be interpreted as probability distributions over words. These distributions are, just like with LSA, inferred automatically from a training corpus. Documents are in turn interpreted as a (soft) mixture of these topics (again, just like with LSA).
model_lda(corpus, ...) # S3 method for mm_file model_lda(corpus, ...) # S3 method for mm model_lda(corpus, ...) # S3 method for python.builtin.list model_lda(corpus, ...) # S3 method for wrapped model_lda(corpus, ...) # S3 method for gensim.interfaces.TransformedCorpus model_lda(corpus, ...) # S3 method for python.builtin.tuple model_lda(corpus, ...) # S3 method for python.builtin.tuple model_lda(corpus, ...) load_lda(file) model_ldamc(corpus, ...) # S3 method for mm_file model_ldamc(corpus, ...) # S3 method for mm model_ldamc(corpus, ...) # S3 method for python.builtin.list model_ldamc(corpus, ...) load_ldamc(file)
corpus | Model as returned by |
---|---|
... | Any other options, from the official documentation of |
file | Path to a saved model. |
Target dimensionality (num_topics
) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.
model_lda
- Single-core implementation.
model_ldamc
- Multi-core implementation.
#> → Preprocessing 9 documents #> ← 9 documents after perprocessingdictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) corpus_mm <- serialize_mmcorpus(corpora, auto_delete = FALSE) # fit model lda <- model_lda(corpus_mm, id2word = dictionary, num_topics = 2L) lda_topics <- lda$get_document_topics(corpora) get_docs_topics(lda_topics)#> # A tibble: 9 x 4 #> dimension_1_x dimension_1_y dimension_2_x dimension_2_y #> <dbl> <dbl> <dbl> <dbl> #> 1 0 0.167 1 0.833 #> 2 0 0.134 1 0.866 #> 3 0 0.123 1 0.877 #> 4 0 0.114 1 0.886 #> 5 0 0.228 1 0.772 #> 6 0 0.305 1 0.695 #> 7 0 0.505 1 0.495 #> 8 0 0.788 1 0.212 #> 9 0 0.853 1 0.147