From Raw Data to Automated Insights
Author: Jorge Anicama | 3 min read | August 25, 2020
Every machine learning (ML) expert knows that a good ML data pipeline is as good as the large amounts of data sets it can access. Therefore, data integration techniques are critical for:
- Connecting business users and data that is difficult to find
- Relating facts between intricate data sources with well-selected and curated datasets
- Providing large amounts of data
When considering data integration, we are implicitly creating an automated process. As a result, another important component is security, especially if the solution requires communication with a given application (in this case Oracle Cloud resources) to access a given resource.
OAuth is a security standard that applications can use to provide client applications with “secure delegated access.” OAuth works over HTTPS and authorizes devices, APIs, servers, and applications with access tokens instead of credentials.
After coupling our data integration process with a secure way to communicate with a remote application to access a given resource, the next main step is to process the data obtained from the given resource. This step involves an on-the-fly transformation and final disposition of the output. The destination of the processed data will be a database (here we chose Oracle Autonomous Datawarehouse – OADW).
Having the data in OADW opens up unlimited possibilities for analysis: typical ad-hoc (descriptive) analytical reporting or the advanced (predictive) analytics using ML algorithms.
The Business Case:
Oracle Identity Cloud Services (IDCS), among many other security related things, keeps a log of:
- Successful/unsuccessful login attempts
- IP address
- Browser
- Time
- User id
- etc
With the caveat of this audit information only for a maximum period of 90 days.
The Solution:
Fortunately, Oracle IDCS offers a rich set of APIs, and one in particular, “Audit Events REST APIs,” will help us resolve the inconvenience here.
Main steps:
1. Create an Application in Oracle IDCS and get Client ID, Client Secret
2. Verify the credentials work by using Postman
3. Develop an automated process to periodically access the remote resource:
- Apply the credentials to obtain the Access Token that will allow us access the remote resource
- Get the resource (apply on-the-fly transformation if needed) and disposition the result in given destination (OADW)
- Wrap all steps above in python script for automation
OAC Reporting and Analysis:
Now, all the data is ready for further analysis. You can simply create a project and include the attributes for pattern analysis.
One example is to perform a trend analysis by analyzing the number of successful, unsuccessful login events by date or by hour (of the day). Another type of analysis is to find out if selected attributes have any sort of similarity (Clustering). Your analysis can continue on each individual cluster for further detection of other specific characteristic in the cluster.
Want to learn more about building automated processes in OAC? Watch my webinar, “Oracle Analytics Cloud: From Raw Data to Automated Insights.”