A day doesn’t go by without some article on machine learning and artificial intelligence talking about how neural networks can be used to deliver deeper insights from company data. Many of the examples for neural networks revolve around the processing of pictures, images and sound files. But neural networks have been around for a long time and have been used to solve business problems for many decades. One of the challenges with using neural network models has been with how they can be used in production systems. One such solution is to run R code in the database. For an Oracle Database, the Advanced Analytics Option is needed, and its component called Oracle R Enterprise (ORE). This article will show you have to build an R neural network model, store it in Oracle Database, and then call/use this neural network model using SQL.
Prerequisites
To use Oracle R Enterprise you will need the following:
- Oracle 11.2.0.3 (or greater) Database Enterprise Edition
- Oracle R Enterprise 1.4 (or greater)
Oracle R Enterprise 1.4 (or greater) has a specific implementation of the R neural network algorithm. This algorithm is called using the ore.neural function.
If you are using an earlier version of Oracle R Enterprise, you can use any neural network algorithm from your preferred R package. To use this you will need to ensure that the R package is installed on the client machine in addition to the Oracle R Enterprise Database Server installation. It is important to have the same versions of the neural network R package on both the client and database server; otherwise, you may get some errors or inconsistency in behavior.
If Oracle R Enterprise is new to you, make sure to check out the web page for the product and the documentation for the product.
The Process – two ways of doing this
Oracle R Enterprise allows data scientists and database developers to build and use machine learning models using the R language. The common part for both of these types of users is the Oracle Database, where the data resides, as illustrated in Figure 1. The Oracle Database will also be a repository for R models and R objects, which allows the data scientist to work in a data secure environment.
The data scientist using the R language can work within their R environment using the R language without having to learn SQL or any other database tool. They can seamlessly work with data stored in the Oracle Database as if it is an R object. They can also create new R objects and R data sets and have these stored inside the Oracle Database. One such object is an R script. An R script contains the R code to be performed; for example, create a neural network model or an R script that uses this model to score or label a new data set. Once an R object is stored in the Oracle Database it now becomes visible using SQL. This is illustrated in Figure 1, and once an R script or object is visible using SQL then any application can use SQL to call and run R scripts and machine learning models. The running of R scripts in the Oracle Database is called Embedded R Execution.
Figure 1 : Using Oracle R Enterprise from R or from SQL
The following examples will illustrate the creation of ORE scripts to create a new neural network, and how this model can be called, using a separate R script, using SQL. The dataset being used is a Wine data set that is commonly used and is available on the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/wine). With this data set the target attribute (the one we want to predict) is the Alcohol level of the wine.
The ore.scriptCreate ORE function is used to create and store an R script in Oracle Database. The following two scripts contain the code to generate and save a neural network in an ORE data store in the database. The second script is used to call the neural network model and score new data. This code is run by the data scientist in their R environment.
ore.scriptCreate('DEMO_BUILD_NN_MODEL', function(dat, ds_name) {
nn_mod <- ore.neural(alcohol ~., data=dat)
ore.save(mod, name=ds_name, overwrite=TRUE)
}, overwrite = TRUE)
This script creates a function called DEMO_BUILD_NN_MODEL. This function accepts two parameters. The first (dat) is the data set to be processed. The second parameter (ds_name) is the name of the ORE data store where the model generated will be saved. The model is called nn_mod in the above example.
It should be noted that the neural network model does not physically exist at this point. All that has been done so far is the creation of a script, and this script is stored in the database. The next section of the article will show to run this script, generate and store the neural network model in the database.
The following R script can be used to score or label new data.
ore.scriptCreate('DEMO_NN_MODEL_APPLY', 'function(dat, ds_name) {
ore.load(ds_name)
pre <- predict(nn_mod, newdata=dat, supplemental.cols="alcohol")
res <- cbind(dat, PRED=pre)
res <- res[,c("alcohol", "PRED")]
}, overwrite = TRUE)
This script (DEMO_NN_MODEL_APPLY) takes two parameters. The first parameter (dat) contains the data set to label. The second parameter (ds_name) is the name of the data store that contains the neural network model. When the data store is loaded, the neural network model becomes available. The neural network model is then used to predict the alcohol level for each of the wines in data set (dat). The predicted values are then combined with the original data set using the cbind command. The script finally selects the actual alcohol level variable and the newly created PRED variable (containing the predicted value) and returns a data set containing these two variables from the function.
In this section I showed how you can take some R code for creating and using neural networks in R, wrap them up, and store this code in an Oracle Database.
Calling R Neural Network Model using SQL
Now that the data scientist has finished working on their machine learning R code, it is time for the SQL Developer to take over and use this code. Using SQL code to call the R code shown in the previous section of this article allows any application to run the R code.
Oracle R Enterprise has a number of functions available to allow you to run the R scripts stored in the database. These functions are described below.
|
Unfortunately there isn’t time to go through all of these, but for simplicity let’s look at the rqTableEval function.
The first thing that we need to do is create and store the neural network model in the database. This is achieved by running the first script (DEMO_BUILD_NN_MODEL). This script does not return anything and accepts two parameters.
select *
from table(rqTableEval(cursor(select * from WINE_DATASET_BUILD),
cursor(select 1 as "ore.connect", 'DEMO_NN_DS' as "ds_name" from dual),
'XML',
'DEMO_BUILD_NN_MODEL') );
The select statement on the WINE_DATASET_BUILT defines the data set to use and the first parameter to the R script. The name of the data store for the model is defined as DEMO_NN_DS, and this is passed as the second parameter. The ‘XML’ line defines the output format for the select statement. The R script does not return anything, so an empty XML object will be displayed. The final line of the select statement is the name of the R script.
Now that the neural network model has been created and stored in the database, the next step is using this model to score or label new data. This can be done using the rqTableEval function to call the DEMO_NN_MODEL_APPLY script. This script is called in a very similar way to the previous example, but in this case the R script returns an R data frame. This can contain zero, one or many records. But to allow the returning of the R data frame, the format of the data frame needs to be defined. In the previous section the returned data frame from the script was defined as containing two variables. The format of these needs to be defined in the second to last line of the following rqTableEval function call.
select *
from table(rqTableEval(cursor(select * from WINE_DATASET_APPLY),
cursor(select 1 as "ore.connect", 'DEMO_LM_DS' as "ds_name" from dual),
'select 1 as "alcohol", 1 as "PRED" from dual',
'DEMO_NN_MODEL_APPLY') );
This select statement will return the results contained in the R data frame, defined in that last line of the DEMO_NN_MODEL_APPLY script. It will contain the original Alcohol value for the wine along with the predicted value.
Summary
In this article an example was given of how you can use the R neural network algorithm to build a machine learning model and then use this model to label new data. The R code needed to create and use the model was defined in separate R scripts. When these R scripts were stored in Oracle Database they became callable using SQL. Now we can use SQL to call R code to perform machine learning on data, and return the results to SQL. The full functionality of the R language can be exposed via SQL using a similar approach. This also includes using the graphing and charting feature of the R language. Check out my blog at www.oralytics.com for more examples.
Start the discussion at forums.toadworld.com