Reduce vector space dimensionality. This is a very efficient (both memory- and CPU-friendly) approach to approximating TfIdf distances between documents, by throwing in a little randomness.

model_rp(corpus, ...)

# S3 method for wrapped
model_rp(corpus, ...)

# S3 method for gensim.interfaces.TransformedCorpus
model_rp(corpus, ...)

load_rp(file)

Arguments

corpus

Corpus as returned by wrap. A tf-idf/bag-of-words transformation is recommended for LSI.

...

Any other options, from the official documentation.

file

Path to a saved model.

Details

Target dimensionality (num_topics) of 200–500 is recommended as a “golden standard” https://dl.acm.org/citation.cfm?id=1458105.