Transform into a latent n dimensional space via Latent Semantic Indexing.

model_lsi(corpus, distributed = FALSE, ...)

# S3 method for wrapped
model_lsi(corpus, distributed = FALSE, ...)

# S3 method for list
model_lsi(corpus, distributed = FALSE, ...)

# S3 method for python.builtin.list
model_lsi(corpus, distributed = FALSE, ...)

# S3 method for python.builtin.tuple
model_lsi(corpus, distributed = FALSE,
  ...)

load_lsi(file)

Arguments

corpus

Corpus as returned by wrap. A tf-idf/bag-of-words transformation is recommended for LSI.

distributed

If TRUE - distributed mode (parallel execution on several machines) will be used.

...

Any other options, from the official documentation.

file

Path to a saved model.

Details

Target dimensionality (num_topics) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.

Examples

docs <- prepare_documents(corpus)
#> Preprocessing 9 documents #> 9 documents after perprocessing
dictionary <- corpora_dictionary(docs) corpora <- doc2bow(dictionary, docs) # fit model lsi <- model_lsi(corpora, id2word = dictionary, num_topics = 2L) lsi$print_topics()
#> [(0, u'0.644*"system" + 0.404*"user" + 0.301*"eps" + 0.265*"response" + 0.265*"time" + 0.240*"computer" + 0.221*"human" + 0.206*"survey" + 0.198*"interface" + 0.036*"graph"'), (1, u'0.623*"graph" + 0.490*"trees" + 0.451*"minors" + 0.274*"survey" + -0.167*"system" + -0.141*"eps" + -0.113*"human" + 0.107*"response" + 0.107*"time" + -0.072*"interface"')]