Read Excel File > Redefine Format > Reformat > Output File
We read each row as one record with the Read Excel File component. Then, using a Redefine Format, we slice the record and assign them in fields. Lastly, using a Reformat, we convert each field to its expected datatype.
The above design is simple, given that we're extracting strings and decimals only. Dates formatted as "YYYYMMDD" will be processed without issues. But when dates take the format of "MM/DD/YYYY" (i.e. "1/1/2010"), things get more complicated.
Supposed we have the following table in the excel file:
EMP_NO | DEPT_NO | FROM_DATE | TO_DATE |
---|---|---|---|
100001 | 10 | 1/1/2010 | 11/2/2014 |
100002 | 10 | 1/1/2010 | 11/2/2014 |
100003 | 10 | 1/1/2010 | 11/2/2014 |
100004 | 20 | 2/5/2010 | 12/4/2014 |
100005 | 20 | 2/5/2010 | 12/4/2014 |
Using the above design, when we get to the Reformat component, we need to use string_split() to parse the month, the day, and the year. Now, we can manipulate the data to make it look like "MM/DD/YYYY".
Transform function will be similar as below:
out :: reformat(in) =
begin
let string("|")[3] from_date_vector = string_split(in.FROM_DATE,"/");
let string("|") from_date_month = from_date_vector[0];
let string("|") from_date_day = from_date_vector[1];
let string("|") from_date_year = from_date_vector[2];
let string("|")[3] to_date_vector = string_split(in.TO_DATE,"/");
let string("|") to_date_month = to_date_vector[0];
let string("|") to_date_day = to_date_vector[1];
let string("|") to_date_year = to_date_vector[2];
out.* :: in.*;
out.FROM_DATE :: string_lpad(from_date_month, 2, "0") + "/" + string_lpad(from_date_day, 2, "0") + "/" + from_date_year;
out.TO_DATE :: string_lpad(to_date_month, 2, "0") + "/" + string_lpad(to_date_day, 2, "0") + "/" + to_date_year;
end;
begin
let string("|")[3] from_date_vector = string_split(in.FROM_DATE,"/");
let string("|") from_date_month = from_date_vector[0];
let string("|") from_date_day = from_date_vector[1];
let string("|") from_date_year = from_date_vector[2];
let string("|")[3] to_date_vector = string_split(in.TO_DATE,"/");
let string("|") to_date_month = to_date_vector[0];
let string("|") to_date_day = to_date_vector[1];
let string("|") to_date_year = to_date_vector[2];
out.* :: in.*;
out.FROM_DATE :: string_lpad(from_date_month, 2, "0") + "/" + string_lpad(from_date_day, 2, "0") + "/" + from_date_year;
out.TO_DATE :: string_lpad(to_date_month, 2, "0") + "/" + string_lpad(to_date_day, 2, "0") + "/" + to_date_year;
end;
The problem is solved. Unfortunately, there is another problem. What if, for any kind of reason, the date changes its format to "MM.DD.YYYY"? There's no other way but to manually edit the code.
Now, here's a better solution.
Create an excel file using the given table above. Name it as emp_dept_info.xlsx.
Create a new graph then connect the components in the following order:
Read Excel File > Reformat > Output File
Supply the xls parameter with the excel file in the Read Excel File component. Configure other parameters as needed.
In the Ports tab, click Generate. You should get something like this:
record
string("\t") sheet;
string("\t") EMP_NO;
string("\t") DEPT_NO;
string("\t") FROM_DATE;
string("\t") TO_DATE;
string("\0") rest_of_line;
end
string("\t") sheet;
string("\t") EMP_NO;
string("\t") DEPT_NO;
string("\t") FROM_DATE;
string("\t") TO_DATE;
string("\0") rest_of_line;
end
Optional:
You can also use the command-line to generate this record format, simply use:
excel-gendml.ksh -single emp_dept_info.xlsx -sheet-name "Sheet1" -sheet-header-row 1 -record-delimiter "\0"
or
excel-gendml.ksh -single emp_dept_info.xlsx -sheet-index 1 -sheet-header-row 1 -record-delimiter "\0"
or
excel-gendml.ksh -single emp_dept_info.xlsx -sheet-index 1 -sheet-header-row 1 -record-delimiter "\0"
Caution: If you pull the date as is, Ab Initio will give you a decimal representation of the date (as shown below). And this is not what we want.
EMP_NO | DEPT_NO | FROM_DATE | TO_DATE | newline |
---|---|---|---|---|
100001 | 10 | 40179 | 41945 | \n |
100002 | 10 | 40179 | 41945 | \n |
100003 | 10 | 40179 | 41945 | \n |
100004 | 20 | 40214 | 41977 | \n |
100005 | 20 | 40214 | 41977 | \n |
So the next step is to edit the embedded layout and change the format of those dates to decimal. Like this:
record
string("\t") sheet;
string("\t") EMP_NO;
string("\t") DEPT_NO;
decimal("\t") FROM_DATE;
decimal("\t") TO_DATE;
string("\0") rest_of_line;
end
string("\t") sheet;
string("\t") EMP_NO;
string("\t") DEPT_NO;
decimal("\t") FROM_DATE;
decimal("\t") TO_DATE;
string("\0") rest_of_line;
end
Use the following transform function in the Reformat component:
include "~$AB_COMPONENTS/Interchange/Excel/excel_to_dml_date.xfr";
out :: reformat(in) =
begin
out.* :: in.*;
out.FROM_DATE :: excel_to_dml_date(in.FROM_DATE);
out.TO_DATE :: excel_to_dml_date(in.TO_DATE);
end;
out :: reformat(in) =
begin
out.* :: in.*;
out.FROM_DATE :: excel_to_dml_date(in.FROM_DATE);
out.TO_DATE :: excel_to_dml_date(in.TO_DATE);
end;
Finally, use the following record format for your Output File.
record
decimal("|") EMP_NO;
decimal("|") DEPT_NO;
date("MM/DD/YYYY")("|") FROM_DATE;
date("MM/DD/YYYY")("|") TO_DATE;
string(1) newline = "\n";
end
decimal("|") EMP_NO;
decimal("|") DEPT_NO;
date("MM/DD/YYYY")("|") FROM_DATE;
date("MM/DD/YYYY")("|") TO_DATE;
string(1) newline = "\n";
end
Save and run the graph.
Resulting records must be similar to this:
EMP_NO | DEPT_NO | FROM_DATE | TO_DATE | newline |
---|---|---|---|---|
100001 | 10 | 01/01/2010 | 11/02/2014 | \n |
100002 | 10 | 01/01/2010 | 11/02/2014 | \n |
100003 | 10 | 01/01/2010 | 11/02/2014 | \n |
100004 | 20 | 02/05/2010 | 12/04/2014 | \n |
100005 | 20 | 02/05/2010 | 12/04/2014 | \n |
Comments are appreciated.
Thank you it really worked.
ReplyDeleteThanks made it easy
ReplyDeleteThank you so much. Please keep up the good work.
ReplyDelete