similarity.Rd
Splits the index into several smaller sub-indexes (“shards”),
which are disk-based. If your entire index fits in memory
(~one million documents per 1GB of RAM), you can also use
the similarity_matrix
. It is more simple but
does not scale as well: it keeps the entire index in RAM,
no sharding. It also do not support adding new document
to the index dynamically.
similarity(corpus, ...) # S3 method for gensim.corpora.mmcorpus.MmCorpus similarity(corpus, num_features, ...) # S3 method for mm_file similarity(corpus, num_features, ...) # S3 method for python.builtin.tuple similarity(corpus, num_features, ...)
corpus | A corpus. |
---|---|
... | Any other parameters to pass to the Python function, see official documentation. |
num_features | Size of the dictionary i.e.: |