Running the active learning core¶
The active learning core works by trying to predict the relations using information provided by the user. This means you’ll have to label some of the examples and based on those, the core will infer the rest. The core will also give you to label the more important examples (those which best helps to figure out the other cases).
To start using it you’ll need to define a relation, run the core, label some evidence and re-run the core loop. You can also label evidences and re-run the core as much as you like to have a better performance.
Creating a relation¶
Running the core¶
After creating a relation, you can start the core to look for instances of that relation.
You can run this core in two modes: High precision or high recall. Precision and recall can be traded with one another up to a certain point. I.e. it is possible to trade some recall to get better precision and vice versa.
To visualize better this trade off, lets see an example: A precision of 99% means that 1 of every 100 predicted relations will be wrong and the rest will be correct. A recall of 30% means that only 30 out of 100 existent relations will be detected by the algorithm and the rest will be wrongly discarded as “no relation present”.
Run the active learning core by doing:
python bin/iepy_runner.py <relation_name> <output>
--tune-for=high-recall before the relation name to switch
between modes. The default is high precision.
This will run until it needs you to label some of the evidences. At this point, what you need to do is go to the web interface that you ran on the previous step, and there you can label some evidences.
When you consider that is enough, on the prompt that the iepy runner presented you, continue the execution by typing run.
That will cycle again and repeat the process.
Run the active learning core in the command line and ask it to STOP. It’ll save a csv with the automatic classifications for all evidences in the database.
Also, note that you can only predict a relation for a text that has been inserted into the database. The csv output file has the primary key of an object in the database that represents the evidence that was classified as “relation present” or “relation not present”. An evidence object in the database is a rich-in-information object containing the entities and circumstances surrounding the prediction that is too complex to put in a single csv file.
In order to access the entities and other details you’ll need to write a script to talk with the database (see iepy/data/models.py).
If you want to modify the internal behavior, you can change the settings file. On your instance
folder you’ll fine a file called
extractor_config.json. There you’ve all the configuration
for the internal classifier, such as:
This sets the classifier algorithm to be used, you can choose from:
Features to be used in the classifier, you can use a subset of:
These can be added as sparse adding them into the sparse_features section or added as dense into the dense_features.
The features in the sparse section will go through a stage of linear dimension reduction and the dense features, by default, will be used with a non-linear classifier.
Viewing predictions on the web user interface¶
If you prefer to review the predictions using the web interface is possible to run the active learning core in a way that stores the results on the database and they are accesible through the web.
To do so, you’ll have to run the core like this:
python bin/iepy_runner.py --db-store <relation_name>
We do not have an specialized interface to review predictions but you can still view them by using the interface to create a reference corpus.
This way, you’ll get labels as a new judge called iepy-run and a date.
Saving predictor for later use¶
Since training could be a slow process, you might want to save your trained predictor and re-use it several times without the need to train again.
You can save it this by doing:
python bin/iepy_runner.py --store-extractor=myextractor.pickle <relation_name> <output>
And re use it like this:
python bin/iepy_runner.py --trained-extractor=myextractor.pickle <relation_name> <output>