Customer: Haut Comité Français pour la Résilience Nationale (HCFRN)
Programme: C2IA
Supply Chain: HCFRN > CS Group SPACE
Context
DOCIA: Operational tool for information processing and retrieval. The idea is not to change the way in which data is archived, but to rely effectively on the organizational and technical means deployed or in the process of being deployed.
CS Group responsabilities for Prototype of a platform for document retrieval and advanced indexing are as follows:
- Need analysis
- Design & development

The features are as follows:
- Full text, metadata and temporal search with highlights
- 5 views:
- Details is text blocks corresponding to the search with the main metadata
- Table is the results in list format
- Directory is the results in the document tree structure
- Statistics is the pie chart of documents found by type, average size etc…
- Map is the locations found in documents on a background map
- Shopping cart:
- Import/export/permalians
- Suggested documents
- Upload, add, update, delete files
Project implementation
The project objectives are as follows:
- Enable the search in a large quantity of heterogeneous and unorganized documents
- Intelligent use of data, linking, cross-referencing
- Monitoring of local documents, websites, RSS feeds
- Applications: Operational Mapping, Surveillance, Decision Support
The processes for carrying out the project are:
Technical characteristics
The solution key points are as follows:
- Advanced indexation:
- Identification of duplicates
- Text extraction optimization
- Header / footer
- Image pre-processing
- Metadata extraction
- Scalable and extensible
- Minimal use of resources
- Logs management
- Indexing metrics by file type
- Index sharing and export

The main technologies used in this project are:
| Domain |
Technology(ies) |
| Operating System(s) |
Linux, HTML 5 Client |
| Programming language(s) |
HTML |
| Interoperability (protocols, format, APIs) |
XMPP, WMS, WMTS, TMS, FTP, POSIX, Ms SharePoint, PDF, SSO, OpenSearch, Geo/Time |
| Production software (IDE, DEVOPS etc.) |
Docker, Swagger, Git |
| Main COTS library(ies) |
ElasticSearch, PyTorch, Spark |