VN Direct is one of the most reputable securities companies in the Vietnam market with the vision of becoming the first choice of every investor. However, the biggest problem of right now is that the input is 100% done by humans, the process is complicated, low productivity. BlueOC team has worked with to build with AI engine to solve the problem.
Due to the fact that makers and checkers have to use and switch many tools at once, this can lead to a long process and reduce input productivity. Training new members to use the complicated tools is taking time and a lot of effort, so VN Direct needs some new ones to help them improve data entry productivity by 60–70% to complete the inventory and handle all documents released within a day.
The new tool had a barrier of AI engines to recognize and process the mixed data:
We build an input tool that helps us standardize data. Our OCR solution, through smart preprocessing as well as using current State of the art models, converts almost all PDF documents to text to feed into the model for data extraction.
We extracted the data from some documents on stock issuance. Because there is currently no state of the art model that can meet the needs of the problem, we made a custom model to fit this problem. We apply issue type classifications such as ESOP, current shareholder, and named entity recognition to detect the featured fields in the text, like the securities code and denomination of securities, and then map them together. The test results are relatively positive; the accuracy is over 85%, reducing the manual input effort by more than 80%.
We divide our model into 3 heads:
Our solution with OCR can handle all of the pain points above. By smart preprocessing as well as using current State of the art models, almost all pdf documents can be converted to text to feed into the model for data extraction.
For example, We can OCR the PDF with: