DICE
Research People Publications News Contact
DICE
Research People Publications News Contact

Research

Classification of documents with missing data

April 19, 2021

In many practical applications in the financial and the legal domains, thousands of documents need to be annotated with one or more of possibly tens or thousands of labels. In addition to their size, the label sets are frequently updated, making it very impractical to maintain the correct labels per document. Therefore, one would like to train document classifiers that assign labels automatically. Training such classifiers with machine learning methods is a challenge, not only due to the number of the different labels and their volatility but also due to their highly imbalanced distribution. In effect, it is very difficult to get training data that adequately cover all classes. Our research focuses on text classification with few- and zero-shot learning capability to handle rare and unseen classes.


Share:
Back to Research

Document Intelligence Centre of Excellence
dice@iit.demokritos.gr
Patr. Gregoriou E & Neapoleos Str 27,
15341 Agia Paraskevi, Athens, Greece

Privacy Policy Terms & Conditions Contact

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in settings.

DICE
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.