Welcome to IEPY’s documentation!¶

IEPY is an open source tool for Information Extraction focused on Relation Extraction.

To give an example of Relation Extraction, if we are trying to find a birth date in:

“John von Neumann (December 28, 1903 – February 8, 1957) was a Hungarian and American pure and applied mathematician, physicist, inventor and polymath.”

then IEPY’s task is to identify “John von Neumann” and “December 28, 1903” as the subject and object entities of the “was born in” relation.

It’s aimed at:

users needing to perform Information Extraction on a large dataset.
scientists wanting to experiment with new IE algorithms.

You can follow the development of this project and report issues at http://github.com/machinalis/iepy or join the mailing list here

Features¶

A corpus annotation tool with a web-based UI

An active learning relation extraction tool pre-configured with convenient defaults.

A rule based relation extraction tool for cases where the documents are semi-structured or high precision is required.

A web-based user interface that:

Allows layman users to control some aspects of IEPY.

Allows decentralization of human input.

A shallow entity ontology with coreference resolution via Stanford CoreNLP

An easily hack-able active learning core, ideal for scientist wanting to experiment with new algorithms.

Contents:¶

Authors¶

Rafael Carrascosa <rcarrascosa@machinalis.com> (rafacarrascosa at github)

Javier Mansilla <jmansilla@machinalis.com> (jmansilla at github)

Gonzalo García Berrotarán <ggarcia@machinalis.com> (j0hn at github)

Franco M. Luque <francolq@famaf.unc.edu.ar> (francolq at github)

Daniel Moisset <dmoisset@machinalis.com> (dmoisset at github)

Changelog¶

0.9.6

Fixed some dependencies declarations to provide support for python 3.5
Bug fix respect to active learning predictions
Added support for German preprocess (thanks @sweh)

0.9.5

Bug fix on TokenizerSentencerRunner (thanks ezesalta)
Fix on installation dependencies
Tokenization options can be handled from instance settings file
Github Bugs fixed:

0.9.4

Added multicore preprocess
Added support for Stanford 3.5.2 preprocess models

0.9.3

Added grammatical parsing to the preprocess flow of documents
Added support for Spanish preprocess
Restricted each iepy-instance to a single language
Gazetter support
Labeling UI improvements
Performance and memory usage improvements
Model simplifications (labels, metadata)
Storage & view of predictions

0.9.2

Add ability to use custom features (http://iepy.rtfd.org/en/latest/how_to_hack.html#implementing-your-own-features)
Add ability to use rules as features (http://iepy.rtfd.org/en/latest/how_to_hack.html#using-rules-as-features)
Add rules verifier (http://iepy.rtfd.org/en/latest/rules_tutorial.html#verifying-your-rules)
Fixed bugs of compatibility with firefox [thanks dchaplinsky for the bug report]
Skip instead of crashing when a document could not be loaded via csv importer [thanks dchaplinsky for the report and suggestion]
Performance improvement on rules runner
Change instance files schema, now it’s a python package and renamed settings.
Add lemmatization to the pre-process (http://iepy.rtfd.org/en/latest/preprocess.html#lemmatization)
Fix critical bug on loading rules
Fix critical bug on ranking questions on the active learning extraction runner

0.9.1

Add entity kind on the modal dialog
Change arrows display to be more understandable
Join skip and don’t know label options
Change options dropdown for radio buttons
Show help for shortcuts and change the order of the options
Documents rich view (without needing to be labeling the document for some relation)
instance upgrader