Welcome to IEPY’s documentation!¶
IEPY is an open source tool for Information Extraction focused on Relation Extraction.
To give an example of Relation Extraction, if we are trying to find a birth date in:
“John von Neumann (December 28, 1903 – February 8, 1957) was a Hungarian and American pure and applied mathematician, physicist, inventor and polymath.”
then IEPY’s task is to identify “
John von Neumann” and
December 28, 1903” as the subject and object entities of the “
was born in”
- It’s aimed at:
- users needing to perform Information Extraction on a large dataset.
- scientists wanting to experiment with new IE algorithms.
You can follow the development of this project and report issues at http://github.com/machinalis/iepy or join the mailing list here
- A corpus annotation tool with a web-based UI
- An active learning relation extraction tool pre-configured with convenient defaults.
- A rule based relation extraction tool for cases where the documents are semi-structured or high precision is required.
- A web-based user interface that:
- Allows layman users to control some aspects of IEPY.
- Allows decentralization of human input.
- A shallow entity ontology with coreference resolution via Stanford CoreNLP
- An easily hack-able active learning core, ideal for scientist wanting to experiment with new algorithms.
- IEPY installation
- From 0 to IEPY
- Running the active learning core
- Running the rule based core
- About the Pre-Process
- Gazettes resolution
- Creating a reference corpus
- How to Hack
- Language support
- Fixed some dependencies declarations to provide support for python 3.5
- Bug fix respect to active learning predictions
- Added support for German preprocess (thanks @sweh)
- Bug fix on TokenizerSentencerRunner (thanks ezesalta)
- Fix on installation dependencies
- Tokenization options can be handled from instance settings file
- Added multicore preprocess
- Added support for Stanford 3.5.2 preprocess models
- Added grammatical parsing to the preprocess flow of documents
- Added support for Spanish preprocess
- Restricted each iepy-instance to a single language
- Gazetter support
- Labeling UI improvements
- Performance and memory usage improvements
- Model simplifications (labels, metadata)
- Storage & view of predictions
- Add ability to use custom features (http://iepy.rtfd.org/en/latest/how_to_hack.html#implementing-your-own-features)
- Add ability to use rules as features (http://iepy.rtfd.org/en/latest/how_to_hack.html#using-rules-as-features)
- Add rules verifier (http://iepy.rtfd.org/en/latest/rules_tutorial.html#verifying-your-rules)
- Fixed bugs of compatibility with firefox [thanks dchaplinsky for the bug report]
- Skip instead of crashing when a document could not be loaded via csv importer [thanks dchaplinsky for the report and suggestion]
- Performance improvement on rules runner
- Change instance files schema, now it’s a python package and renamed settings.
- Add lemmatization to the pre-process (http://iepy.rtfd.org/en/latest/preprocess.html#lemmatization)
- Fix critical bug on loading rules
- Fix critical bug on ranking questions on the active learning extraction runner
- Add entity kind on the modal dialog
- Change arrows display to be more understandable
- Join skip and don’t know label options
- Change options dropdown for radio buttons
- Show help for shortcuts and change the order of the options
- Documents rich view (without needing to be labeling the document for some relation)
- instance upgrader