Splits the index into several smaller sub-indexes (“shards”), which are disk-based. If your entire index fits in memory (~one million documents per 1GB of RAM), you can also use the similarity_matrix. It is more simple but does not scale as well: it keeps the entire index in RAM, no sharding. It also do not support adding new document to the index dynamically.

similarity(corpus, ...)

# S3 method for gensim.corpora.mmcorpus.MmCorpus
similarity(corpus, num_features,
  ...)

# S3 method for mm_file
similarity(corpus, num_features, ...)

# S3 method for python.builtin.tuple
similarity(corpus, num_features, ...)

Arguments

corpus

A corpus.

...

Any other parameters to pass to the Python function, see official documentation.

num_features

Size of the dictionary i.e.:reticulate::py_len(dictionary).