
Overview
APEX_DATA_PARSER provides APIs to help us to:
- Analyze a file to understand its format and structure.
- Identify the field names and datatypes of fields contained in the file.
- Use a SELECT statement to pull data from a file without having to develop complex parsing logic.
APEX_DATA_PARSER supports the following file formats:
- CSV
- JSON
- Excel (XLSX)
- XML
Our Options Before APEX_DATA_PARSER
- Use SQL*Loader to load data into a staging table and then kick off a PL/SQL process to perform transformations.
- Use UTL_FILE within PL/SQL to read data from files on the file system.
- Use ‘External Table’ functionality within PL/SQL to read data from files as if they were a table.
- Build a custom program using a language like Java to read data from the file, parse it and insert it into a table using JDBC.
All of these methods come with challenges, which include:
- None of them support parsing and loading JSON, Excel or XML files.
- None of these methods easily handle the structure of your file changing.
- All of these methods require that you copy the file to a specific server that the database is able to access.
- SQL*Loader and External Tables require that you manually build a definition file to define the structure of the file you are loading.
- UTL_FILE and External Tables require that you configure database directories to define the location of the external file on the file system.
- UTL_FILE requires that you manually build the parsing logic to split records into fields etc.
APEX_DATA_PARSER
The File
In order or for APEX_DATA_PARSER to act on your file, you first have to get it into a BLOB. You might now be thinking if I have to get the file into a BLOB then this really isn’t solving the problem of having to move the file to a location where the database is aware of it. Enter APEX_WEB_SERVICE. You could use APEX_WEB_SERVICE to first get the file BLOB from somewhere like AWS S3 and then consume the response using APEX_DATA_PARSER e.g.
One of the most interesting features for me is the ability to analyze a file and generate a JSON document containing everything you need to know about the file.
To make things even easier, there is a companion API to APEX_DATA_PARSER.DISCOVER called APEX_DATA_PARSER.GET_COLUMNS.
Now that we know what our file looks like we would like to parse and process the data within the file using SQL. This can be done using APEX_DATA_PARSER.PARSE.
At every turn, APEX_DATA_PARSER is making it easier for you to deal with information about the file and the file itself using SQL.
Considerations
Many More Parameters
There are many more parameters to the APIs I have described above. I encourage you to read the documentation to learn about these.
It Supports a Flat File Structure
- Although it supports JSON and XML file types, APEX_DATA_PARSER can only handle data in a flat structure.
- You cannot, therefore, parse out data from nested objects in JSON or XML.
- During discovery of JSON files, APEX_DATA_PARSER will try and determine the main array and parse the top level of that array only.
Basic DATE Type Identification
- It seems to rely on clues in the formatting of date fields to identify the format string.
- For example:
- DD/MM/YYYY is not identified as a DATE while DD.MM.YYYY is.
- MM/DD/YYYY is identified as a DATE.
- It seems to me if it tested a number of different date format strings against records in the file, it could improve its success rate in identifying DATE fields.
Performance
APEX_DATA_PARSER relies on underlying code to handle the actual data parsing. For example, if you are on 11g of the database it uses APEX_JSON to parse JSON. If you are on 18c of the database it will use the native JSON parser. The native JSON parser in 18c is orders of magnitude faster than APEX_JSON. In view of this the parsing speed can vary dramatically depending on the version of the DB you are on.
This is actually very clever. By encapsulating file parsing logic within APEX_DATA_PARSER you will automatically gain performance improvements when you upgrade from 11g to 18c.
Use Case
Prior to APEX_DATA_PARSER this would have been nearly impossible. At the very least APEX_DATA_PARSER cut the development time in half.
Conclusion
There are many more examples where APEX_DATA_PARSER can save you time and allow you to provide real business benefit. For example, you can save users from manual data entry by allowing them to upload data from an Excel spreadsheet. APEX_DATA_PARSER even handles fetching data from a specific tab of a spreadsheet.
Here is the link to the APEX 19.1 documentation for APEX_DATA_PARSER
Author
Jon Dixon, Co-Founder JMJ Cloud