In addition to implementing the LSA computations described in the introduction, Gensim contains a general library for topic modeling, document indexing and similarity retrieval for large corpora of texts, as well as the Porter stemming algorithm
Immediately after he does that, the system parses the documents, applies the stemming algorithm
and the TFIDF formulas and ultimately extracts the most important five key concepts from each file: C11-C15, C21-C25.
Evaluation of a Stemming Algorithm
for Portuguese Language.
Julie, (1968) Julie Beth Lovins, Development of a Stemming Algorithm
, Mechanical translation and computational linguistics, 1968
33] experimental results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm
for Arabic language and gives generalization accuracy of about 98%.
Subsequently, the keywords are obtained by employing the stemming algorithm
that filters the affixes present in the crawled documents.
Porter, best known as the author of the Porter Stemming Algorithm
, worked in information retrieval research at Cambridge University for over 9 years.
We refer to the group of words that result in a common root after applying a specific stemming algorithm
as a conflation or equivalence class.
For example, a stemming algorithm
reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".
Popular stemming algorithm
, Porter's stemming algorithm
has been used .