NLP Tools
The following is a list of freely available tools and resources from the projects I worked on at 山ּ.
Corporate Financial Information Environment (CFIE)-Final Report Structure Extractor (FRSE) is a desktop application to detect the structure of UK Annual Report and extract the reports' contents on a section level.2- OSMAN Readability Metric
Java open Source tool for Arabic text readability. The tool calculates readability for Arabic text with and without diacritics (Tashkeel). The tool works better with diacritics added in (we provide a method to allow you add diacritics to plain Arabic text).
Java tool to create word clouds using log Likelihood and word frequencies (or any other weight values). Log Likelihood is calculated for a word between two large corpora input that could be in any language. The tool is language independent and was tested on Arabic and English
Java Code that trains classifiers for chairman's statements, governance & remuneration sections from 1,000 annual financial reports ()
Our code allowed us to generate a USAS tagger dictionary file where each entry in the OBO ontology is tagged with the GO IDs shown in its path. Taking the “mucosal immune response” OBO entry shown in Figure 1 we can see there are two paths starting from the child node towards the “biological process” root.
A simple text tool for extracting and summarizing free Welsh. It allows the users to paste, drag and drop, or upload text files as well as determine the size of the summary..
3- NLP & ML Visualization Code and Tutorial
This is a step by step tutorial for text analyst who want an easy start to basic and and common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling and Corpus Linguistics. The tutorial is pat of the "Visualise My Corpus" UCREL and DSG Seminar and Tutorial as well as the "Data Visualisation Workshop for Critical Computational Discourse" at the Data Science Institute at 山ּ, UK..
4- Java word cloud with Log Likelihood
5- Machine Learning Java code
6- Gene Ontology Semantic Tagger (GOST)
7- Welsh Summary Creator (ACC)