We like this for PII discovery.
https://github.com/redhuntlabs/Octopii WorkingOctopii uses Tesseract's Optical Character Recognition (OCR) and Keras' Convolutional Neural Networks (CNN) models to detect various forms of personal identifiable information that may be leaked on a publicly facing location. This is done in the following steps:
The accuracy of the scan can determined via the confidence scores in output. If all the mentioned conditions are met, a score of 100.0 is returned. To train the model, data can also be fed into the model_generator.py script, and the newly improved h5 file can be used. Usage
Exampleowais@artemis ~ $ python3 octopii.py pii_list Not a valid image format: pii_list/aadhaar/aadhaar-8.gif [ { "asset_type": "Bank", "confidence": 100.0, "file_name": "passbook", "extension": "jpeg", "path": "pii_list/bank/passbook.jpeg" }, { "asset_type": "Photo", "confidence": 99.98, "file_name": "IMG-20200331-WA0037", "extension": "jpg", "path": "pii_list/photos/IMG-20200331-WA0037.jpg" }, { "asset_type": "PAN", "confidence": 100.0, "file_name": "pan-7", "extension": "jpg", "path": "pii_list/pan/pan-7.jpg" }, { "asset_type": "Aadhaar", "confidence": 97.31, "file_name": "aadhaar-14", "extension": "jpg", "path": "pii_list/aadhaar/aadhaar-14.jpg" } ] Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2024
Categories |