Pages

Wednesday, May 21, 2014

Oracle SQL Loader by example

===========================================
General
===========================================
Oracle SQL Loader is pretty straight forward, and all related reference is well documented.
Reference:
ORA-FAQ
Oracle Reference for SQL-Loader
The only trick is when loading data into fields not from input csv file.


===========================================
Example A - Load first field from a sequence
===========================================
In this example, The range_id field is the first field in table DATA_RANGE_INPUT.

range_id, field is populated by sequence.
Note the leading "," in the csv file.
It is a "place holder" for a sequence field , which is not populated by input data from scv file.

Files:
data_input.csv
data_load.cmd
data_load.ctl

data_input.csv
range_id,code_a,code_b,from_data,to_data,status_ind,status
,44,36,0012,9999,Y,New
,44,36,0010,9999,Y,New
,44,36,1000,4567,Y,New


run_tn_range_load.cmd
sqlldr user/password@connection_string control=data_load.ctl

tn_range_load.ctl
OPTIONS(SKIP=1)
LOAD DATA
INFILE 'data_input.csv' 
BADFILE 'data_input.err' 

APPEND INTO TABLE DATA_RANGE_INPUT 
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
 range_id "DATA_INPUT_SEQ_01.NEXTVAL",
 code_a, 
 code_b,
 from_data, 
 to_data, 
 status_ind,
 status,
 creation_date  SYSDATE,
 last_modification_date SYSDATE,
 skip_this_field  FILLER,
 update_user CONSTANT "DATALOAD")

OPTIONS(SKIP=1) - skip header line


===========================================
Example B - Load Huge file
===========================================
Issue B.

In this example, the load is processing 1,000,000 rows and more.

OPTIONS (ROWS=5000000, BINDSIZE = 400000000, READSIZE=400000000, ERRORS=20000000)
LOAD DATA
APPEND
INTO TABLE SQLLOAD_COUNTERS_HIST
fields terminated by ","
TRAILING NULLCOLS
(
ts_last_modified,
node_id,
static_id,
dyn1,
dyn2,
counter_sum,
counter_delta,
counter_name 
)

ROWS -- Number of rows in conventional path bind array or between direct path data saves
BINDSIZE -- Size of conventional path bind array in bytes  
            (Default 256000)
READSIZE -- Size of read buffer                  
            (Default 1048576)
ERRORS -- Number of errors to allow            
          (Default 50)

===========================================
Example C - Load DATE using TO_DATE and SUBSTR functions
===========================================
Issue A.
In this example, The first field has miliseconds in the date format.
In the database, this field is DATE, without milliseconds.

Solution
Use String and Conversion functions:
 "TO_DATE(SUBSTR(:ts_last_modified,1,19),'YYYY-MM-DD hh24:mi:ss')" 

Issue B.

Inside the data, there are numerous delimiter lines, in this format '------------------------' , that should be ignored.

Solution
Add Options part, where runtime parameters can be specified inside the control file.
Inside the Options part, add ERRORS=1000000.
By setting the ERRORS parameter to a very high value, we avoid the error "MAXIMUM ERROR COUNT EXCEEDED".


Data looks like:
2014-11-01 00:12:08.716,104032100000000,298,0,34,0,codeA
2014-11-01 00:12:08.716,104032000000001,1,405,13868,7,codeB

2014-11-01 00:12:08.716,104032000000001,1,298,24076,6,codeC
----------------------------------
2014-11-01 00:27:08.717,104032100000000,298,0,34,0,codeA
2014-11-01 00:27:08.717,104032000000001,1,405,13871,3,codeB

2014-11-01 00:27:08.717,104032000000001,1,298,24082,6,codeC
----------------------------------

control_file.ctl
OPTIONS (ERRORS=1000000)
LOAD DATA
INFILE 'counter_for_load.txt'
BADFILE 'counter_for_load.bad'
DISCARDFILE 'counter_for_load.dis'
APPEND 
INTO TABLE MY_COUNTERS_TABLE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(ts_last_modified "TO_DATE(SUBSTR(:ts_last_modified,1,19),'YYYY-MM-DD hh24:mi:ss')",
 static_id INTEGER EXTERNAL(20),
 dyn1 INTEGER EXTERNAL(22),
 dyn2 INTEGER EXTERNAL(22),
 counter_sum INTEGER EXTERNAL(20),
 counter_delta INTEGER EXTERNAL(20),
 counter_name FILLER,
 dyn3 "0",
 dyn4 "0",
 dyn5 "0",
 is_processed "0"
 )

runtime command:
sqlldr userA/passA@orainst control=control_file.ctl log=my_sqlldr.log

===========================================
Example D - Runtime error SQL*Loader-704:
===========================================
When running sqlldr, there is the following error:
SQL*Loader-704: Internal error: ulconnect: OCIServerAttach [0]
ORA-12154: TNS:could not resolve the connect identifier specified

The TNS ORA-12154 error suggests that the connect identifier is not known.

But when testing that very same connection with sqlplus, there is no error at all, and the connection is OK.
What is going on??!!

Example:
sqlplus userA/passA@orainst

SQL*Plus: Release 9.2.0.1.0 - Production on Mon Dec 1 14:10:39 2014
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options


SQL>

>sqlldr userid = userA/passA@orainst control=control_file.ctl

SQL*Loader: Release 10.2.0.1.0 - Production on Mon Dec 1 14:10:16 2014

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

SQL*Loader-704: Internal error: ulconnect: OCIServerAttach [0]
ORA-12154: TNS:could not resolve the connect identifier specified

Solution:
The issue is with Oracle Client installation and Path Environment Variable in Windows.
Apparently, that specific computer had three Oracle Client Installations:
Oracle 9, Oracle 10, Oracle 11.

Oracle 9 tnsnames.ora had the correct service descriptor.
Oracle 10 and Oracle 11 tnsnames.ora files had no services listed at all.

sqlplus was able to connect using the Oracle 9 Client tnsnames.ora
sqlldr issue was resolved only after adding the correct entry to Oracle 10 Client tnsnames.ora file.

This is a bit surprising, since checking the Path Environment Variable, Oracle 9 Client is listed first.

set | find "Path"
Path=C:\oracle\ora92\bin;
D:\oracle\product\10.2.0\client_1\bin;
C:\Oracle32\product\11.2.0\client_1\bin;
C:\WINDOWS\system32;
C:\WINDOWS;
C:\WINDOWS\System32\Wbem;
etc...

But when running sqlldr command without parameters, we get the sqlldr "about" and "help" page.
And sqlldr version is Oracle 10!
Now it is clear, why sqlldr is "looking" into the Oracle10 Client tnsnames.ora file.

>sqlldr
SQL*Loader: Release 10.2.0.1.0 - Production on Mon Dec 1 19:34:19 2014
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Usage: SQLLDR keyword=value [,keyword=value,...]
Valid Keywords:

    userid -- ORACLE username/password
   control -- control file name
       log -- log file name
       bad -- bad file name
      data -- data file name
   discard -- discard file name
discardmax -- number of discards to allow          (Default all)
      skip -- number of logical records to skip    (Default 0)
      load -- number of logical records to load    (Default all)
    errors -- number of errors to allow            (Default 50)
      rows -- number of rows in conventional path bind array or between direct path data saves
               (Default: Conventional path 64, Direct path all)
  bindsize -- size of conventional path bind array in bytes  (Default 256000)
    silent -- suppress messages during run (header,feedback,errors,discards,partitions)
    direct -- use direct path                      (Default FALSE)
   parfile -- parameter file: name of file that contains parameter specifications
  parallel -- do parallel load                     (Default FALSE)
      file -- file to allocate extents from
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions  (Default FALSE)
skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable  (Default FALSE)
commit_discontinued -- commit loaded rows when load is discontinued  (Default FALSE)
  readsize -- size of read buffer                  (Default 1048576)
external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE  (Default NOT_USED)
columnarrayrows -- number of rows for direct path column array  (Default 5000)
streamsize -- size of direct path stream buffer in bytes  (Default 256000)
multithreading -- use multithreading in direct path
 resumable -- enable or disable resumable for current session  (Default FALSE)
resumable_name -- text string to help identify resumable statement
resumable_timeout -- wait time (in seconds) for RESUMABLE  (Default 7200)
date_cache -- size (in entries) of date conversion cache  (Default 1000)

PLEASE NOTE: Command-line parameters may be specified either by
position or by keywords.  An example of the former case is 'sqlldr
scott/tiger foo'; an example of the latter is 'sqlldr control=foo
userid=scott/tiger'.  One may specify parameters by position before
but not after parameters specified by keywords.  For example,
'sqlldr scott/tiger control=foo logfile=log' is allowed, but
'sqlldr scott/tiger control=foo log' is not, even though the
position of the parameter 'log' is correct.