SemAnn: Web-based semantic annotation tool for PDF documents

SemAnn allows you to semantically annotate (using RDF triples) text in PDFs. These annotations are then used for recommending similar PDF documents that the reader might find relevant.


SemAnn is an open source web-based semantic annotation tool for PDF files with a special focus on academic publications. SemAnn allows users to collaboratively annotate text, thus making knowledge contained in those PDF files accessible as RDF graphs for further querying. The tool can be used with arbitrary ontologies as annotation vocabularies. The user can enter annotations of various levels of expressivity – from simple typed annotations (e.g. annotations typed as DBpedia resources or ontology classes) to describing relationships between annotations themselves (e.g. describing the citation context of an annotation). Structural context of annotations is made available for querying by the tool’s capability of tracking the hierarchy of annotations. This enables reasoners to answer questions such as “find papers where the problem statement of the paper addresses dynamic programming languages.”. It is hence capable of viewing annotations in the context of scientific discourse like the motivation, problem statement, etc (but not limited to it). With its recommendations of similar papers, SemAnn provides an immediate benefit in return for making the effort of annotation. The justification of recommendations includes information about matches by structural context. Code is available on Github.

Current Team