JMJ CLOUD
  • Home
  • Projects
  • Blog
  • About Us
  • Contact Us

Our Blog

APEX Data Parser (Data Load Made Easy)

8/15/2019

0 Comments

 
APEX_DATA_PARSER
The APEX_DATA_PARSER PL/SQL package was introduced in APEX 19.1. It comes from a long line of useful APIs produced by the APEX development team. APEX_DATA_PARSER solves a number of challenges with loading file based data that I have faced throughout my Oracle career of 25 years. In this post I will explain what problems it solves and how you can use it to make importing data from files much easier.

Overview

Almost all projects whether they are custom APEX applications or full-blown ERP implementations require that you import data into a database table from a file. As Oracle developers our peak productivity zone begins when we can handle data using SQL. The quicker we can start handling data using SQL, the quicker development can be completed.
 
APEX_DATA_PARSER provides APIs to help us to:
  • Analyze a file to understand its format and structure.
  • Identify the field names and datatypes of fields contained in the file.
  • Use a SELECT statement to pull data from a file without having to develop complex parsing logic.
 
APEX_DATA_PARSER supports the following file formats:
  • CSV
  • JSON
  • Excel (XLSX)
  • XML

​Our Options Before APEX_DATA_PARSER

Previously, we have had to employ one of the following methods to get external file data into a database table to start handling the data using SQL:
  • Use SQL*Loader to load data into a staging table and then kick off a PL/SQL process to perform transformations.
  • Use UTL_FILE within PL/SQL to read data from files on the file system.
  • Use ‘External Table’ functionality within PL/SQL to read data from files as if they were a table.
  • Build a custom program using a language like Java to read data from the file, parse it and insert it into a table using JDBC.
 
All of these methods come with challenges, which include:
  • None of them support parsing and loading JSON, Excel or XML files.
  • None of these methods easily handle the structure of your file changing.
  • All of these methods require that you copy the file to a specific server that the database is able to access.
  • SQL*Loader and External Tables require that you manually build a definition file to define the structure of the file you are loading.
  • UTL_FILE and External Tables require that you configure database directories to define the location of the external file on the file system.
  • UTL_FILE requires that you manually build the parsing logic to split records into fields etc.

APEX_DATA_PARSER

APEX_DATA_PARSER handles all of the challenges I described above (and more). It contains a series of APIs that allow you to discover the structure of a file and then parse the data in the file directly from a SQL statement.

The File
In order or for APEX_DATA_PARSER to act on your file, you first have to get it into a BLOB. You might now be thinking if I have to get the file into a BLOB then this really isn’t solving the problem of having to move the file to a location where the database is aware of it. Enter APEX_WEB_SERVICE. You could use APEX_WEB_SERVICE to first get the file BLOB from somewhere like AWS S3 and then consume the response using APEX_DATA_PARSER e.g.

    
​Discovery
One of the most interesting features for me is the ability to analyze a file and generate a JSON document containing everything you need to know about the file. 

    
APEX_DATA_PARSER.DISCOVER returns a JSON document with details of the file encoding, delimiters, and the names and data types for each of the fields in the file. This information is extremely useful. For example, you can use it to write code to handle changes to the numbers of columns in files that are uploaded. Today your user uploads a file with 10 columns and tomorrow they add a column. Being able to inspect the format of the file allows you to dynamically handle that 11th column. This is something you could never do with other Oracle loader tools.
 
To make things even easier, there is a companion API to APEX_DATA_PARSER.DISCOVER called APEX_DATA_PARSER.GET_COLUMNS.

    
​APEX_DATA_PARSER.GET_COLUMNS produces a PL/SQL array of the columns from the JSON profile. 
APEX_DATA_PARESER.GET_COLUMNS
​Parsing
Now that we know what our file looks like we would like to parse and process the data within the file using SQL. This can be done using APEX_DATA_PARSER.PARSE.

    
​APEX_DATA_PARSER.PARSE returns a PL/SQL array with 301 columns, the line_number from the file and up to 300 columns of data from the file.
APEX_DATA_PARSER.PARSE
​Although the column names are generic (col001, col002 etc.) we know exactly what is in each column because we have the profile generated earlier.
 
At every turn, APEX_DATA_PARSER is making it easier for you to deal with information about the file and the file itself using SQL.

Considerations

​There a few considerations that are worth noting.

Many More Parameters
There are many more parameters to the APIs I have described above. I encourage you to read the documentation to learn about these.
 
It Supports a Flat File Structure
  • Although it supports JSON and XML file types, APEX_DATA_PARSER can only handle data in a flat structure.
  • You cannot, therefore, parse out data from nested objects in JSON or XML.
  • During discovery of JSON files, APEX_DATA_PARSER will try and determine the main array and parse the top level of that array only.
 
Basic DATE Type Identification
  • It seems to rely on clues in the formatting of date fields to identify the format string.
  • For example:
    • DD/MM/YYYY is not identified as a DATE while DD.MM.YYYY is.
    • MM/DD/YYYY is identified as a DATE.
  • It seems to me if it tested a number of different date format strings against records in the file, it could improve its success rate in identifying DATE fields.
 
Performance
APEX_DATA_PARSER relies on underlying code to handle the actual data parsing. For example, if you are on 11g of the database it uses APEX_JSON to parse JSON. If you are on 18c of the database it will use the native JSON parser. The native JSON parser in 18c is orders of magnitude faster than APEX_JSON. In view of this the parsing speed can vary dramatically depending on the version of the DB you are on.
 
This is actually very clever. By encapsulating file parsing logic within APEX_DATA_PARSER you will automatically gain performance improvements when you upgrade from 11g to 18c.

Use Case

​We have already used APEX_DATA_PARSER for loading payroll data from external payroll providers into Oracle EBS. These payroll files have a varying number of columns depending on the different pay elements in the payroll file (e.g. gross pay, sick pay, holiday, vacation, etc.). Being able to analyze the profile produced by APEX_DATA_PARSER at run time allowed us to pick up columns added to the file dynamically. As long as these columns matched Oracle payment elements, we could then load them right into Oracle.
 
Prior to APEX_DATA_PARSER this would have been nearly impossible. At the very least APEX_DATA_PARSER cut the development time in half.

Conclusion

For on-premise and cloud-based ERP projects, a significant portion of the technical budget is spent converting data (e.g. items, customers, suppliers, orders, invoices etc.) from a legacy system to a new Oracle ERP. APEX_DATA_PARSER can significantly reduce the cost and complexity of these conversions by removing the complexity of getting you to the stage where you can work with the file data in SQL.
 
There are many more examples where APEX_DATA_PARSER can save you time and allow you to provide real business benefit. For example, you can save users from manual data entry by allowing them to upload data from an Excel spreadsheet. APEX_DATA_PARSER even handles fetching data from a specific tab of a spreadsheet.
 
Here is the link to the APEX 19.1 documentation for APEX_DATA_PARSER

Author

Jon Dixon, Co-Founder JMJ Cloud

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    RSS Feed

    Popular Posts

    - APEX Dog Food
    - Cloud ERP & APEX Mashup
    - Modernizing EBS with APEX
    - Easy APEX_WEB_SERVICE
    - Running APEX in RDS
    - ORDS What & Why?

    Categories

    All
    APEX
    AWS
    Fusion Cloud ERP
    INTEGRATION
    MS GRAPH
    OCI
    ORDS
    PaaS
    RAD
    REST
    SOAP

    Archives

    October 2021
    February 2021
    January 2021
    October 2020
    September 2020
    June 2020
    May 2020
    April 2020
    February 2020
    January 2020
    October 2019
    September 2019
    August 2019
    July 2019
    June 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    September 2017
    August 2017
    July 2017
    June 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016

Company

About
Contact
Blog
  • Home
  • Projects
  • Blog
  • About Us
  • Contact Us