model_lsi.Rd
Transform into a latent n dimensional space via Latent Semantic Indexing.
model_lsi(corpus, distributed = FALSE, ...) # S3 method for wrapped model_lsi(corpus, distributed = FALSE, ...) # S3 method for list model_lsi(corpus, distributed = FALSE, ...) # S3 method for python.builtin.list model_lsi(corpus, distributed = FALSE, ...) # S3 method for python.builtin.tuple model_lsi(corpus, distributed = FALSE, ...) load_lsi(file)
corpus | Corpus as returned by |
---|---|
distributed | If |
... | Any other options, from the official documentation. |
file | Path to a saved model. |
Target dimensionality (num_topics
) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.
#> → Preprocessing 9 documents #> ← 9 documents after perprocessingdictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) # fit model lsi <- model_lsi(corpora, id2word = dictionary, num_topics = 2L) lsi$print_topics()#> [(0, u'0.644*"system" + 0.404*"user" + 0.301*"eps" + 0.265*"response" + 0.265*"time" + 0.240*"computer" + 0.221*"human" + 0.206*"survey" + 0.198*"interface" + 0.036*"graph"'), (1, u'0.623*"graph" + 0.490*"trees" + 0.451*"minors" + 0.274*"survey" + -0.167*"system" + -0.141*"eps" + -0.113*"human" + 0.107*"response" + 0.107*"time" + -0.072*"interface"')]