docsim / UkrVectōrēs – an NLU-powered tool for knowledge discovery, classification, diagnostics and prediction. Entities similarity tool.

1 minute read

Published:

I would like to present you one of my pet projects – docsim / UkrVectōrēs. docsim / UkrVectōrēs – an NLU-powered tool for knowledge discovery, classification, diagnostics and prediction. Entities similarity tool.

docsim / UkrVectōrēs is open source and avaliable on GitHub: https://github.com/malakhovks/docsim.

Caution/Disclaimer

Project and documentation are in active development! For any technical clarifications and questions contact us via email malakhovks@nas.gov.ua or via Issues.

Features

You can think about docsim / UkrVectōrēs as a kind of “cognitive-semantic calculator”. The online toolkit docsim / UkrVectōrēs covers the following elements of distributional analysis:

  • calculate semantic similarity between pairs of words;
  • find words semantically closest to the query word;
  • apply simple algebraic operations to word vectors (addition, subtraction, finding average vector for a group of words and distances to this average value);
  • draw semantic maps of relations between input words (it is useful to explore clusters and oppositions, or to test your hypotheses about them);
  • get the raw vectors (arrays of real values) and their visualizations for words in the chosen model;
  • download default models;
  • use other prognostic models distributive semantics freely distributed, by adjusting the configuration file.