Browse Source

Ver 0.01

master
Asitav Sen 4 years ago
parent
commit
4e5054664a
  1. BIN
      Account/Account Advanced Find View CN.xls
  2. BIN
      Account/Account Advanced Find View CZ.xls
  3. BIN
      Account/Account Advanced Find View DE.xls
  4. BIN
      Account/Account Advanced Find View ES.xls
  5. BIN
      Account/Account Advanced Find View FI.xls
  6. BIN
      Account/Account Advanced Find View IT.xls
  7. BIN
      Account/Account Advanced Find View NL.xls
  8. BIN
      Account/Account Advanced Find View NO.xls
  9. BIN
      Account/Account Advanced Find View PL.xls
  10. BIN
      Account/CodeList/CodeList_Account.xlsx
  11. BIN
      Account/CodeList/CodeList_Account_Addresses.xlsx
  12. BIN
      Account/CodeList/CodeList_Account_Contact_Persons.xlsx
  13. BIN
      Account/CodeList/CodeList_Account_Identification.xlsx
  14. BIN
      Account/CodeList/CodeList_Account_International_Version.xlsx
  15. BIN
      Account/CodeList/CodeList_Account_Sales_Data.xlsx
  16. BIN
      Account/CodeList/CodeList_Account_Tax_Numbers.xlsx
  17. BIN
      Account/CodeList/CodeList_Account_Team.xlsx
  18. BIN
      Account/CodeList/CodeList_Account_Visiting_Hours.xlsx
  19. BIN
      Account/CodeList/CodeList_Account_Visits_Details.xlsx
  20. BIN
      Account/Line of Business translation -TEMPLATE-V1.xlsx
  21. 365
      Accounts.Rmd
  22. 18
      Contact.csv
  23. 402
      Contacts.Rmd
  24. 13
      DataTransformationCRH.Rproj
  25. BIN
      Opportunity Mapping 20210714.xlsx
  26. BIN
      Opportunity mapping AX Building Sites 20210924.xlsx
  27. BIN
      Project_oppt/CodeList/CodeList_Contact_Party_Information.xlsx
  28. BIN
      Project_oppt/CodeList/CodeList_Opportunity.xlsx
  29. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Account_Team_Party_Information_Deprecated_.xlsx
  30. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Competitor_Party_Information.xlsx
  31. BIN
      Project_oppt/CodeList/CodeList_Opportunity_External_Party_Information.xlsx
  32. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Header_Revenue_Plan.xlsx
  33. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Item_Party_Information.xlsx
  34. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Notes.xlsx
  35. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Other_Party_Information_Deprecated_.xlsx
  36. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Party_Information.xlsx
  37. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Preceding_and_Follow_Up_Documents.xlsx
  38. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Product.xlsx
  39. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Product_Notes.xlsx
  40. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Product_Quantity_Plan.xlsx
  41. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Product_Revenue_Plan.xlsx
  42. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Prospect_Contact_Party_Information.xlsx
  43. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Revenue_Splits.xlsx
  44. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Sales_Employee_Party_Information_Deprecated_.xlsx
  45. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Sales_Partner_Party_Information_Deprecated_.xlsx
  46. BIN
      Project_oppt/CodeList/CodeList_Opportunity_Sales_Team_Party_Information.xlsx
  47. BIN
      Project_oppt/CodeList/~$CodeList_Opportunity.xlsx
  48. BIN
      Project_oppt/CodeList_Opportunity_Sales_Team_Party_Information.xlsx
  49. BIN
      Project_oppt/Mapping Rules Opportunities AX to C4.xlsx
  50. BIN
      Project_oppt/Project Advanced Find View 2021-12-22 09_47_35Z.xml
  51. BIN
      Project_oppt/Project Advanced Find View 2021-12-22 09_50_38Z NL.xml
  52. BIN
      Project_oppt/Project Advanced Find View 2021-12-22 09_51_49Z PL.xml
  53. BIN
      Project_oppt/Project Advanced Find View CZ.xls
  54. BIN
      Project_oppt/Project Advanced Find View DE.xls
  55. BIN
      Project_oppt/Project Advanced Find View ES.xls
  56. BIN
      Project_oppt/Project Advanced Find View IT.xls
  57. BIN
      Project_oppt/Project Advanced Find View NL.xls
  58. BIN
      Project_oppt/Project Advanced Find View PL.xls
  59. 373
      Projects.Rmd
  60. 398
      Report.Rmd
  61. 313
      Report.html
  62. BIN
      Service Request Mapping 20210715.xlsx
  63. 374
      Support.Rmd
  64. BIN
      Technical Support/CodeList/CodeList_ServiceRequestSkillsCollectionCollection.xlsx
  65. BIN
      Technical Support/CodeList/CodeList_Service_Request.xlsx
  66. BIN
      Technical Support/CodeList/CodeList_Service_Request_BTD_Reference.xlsx
  67. BIN
      Technical Support/CodeList/CodeList_Service_Request_Item.xlsx
  68. BIN
      Technical Support/CodeList/CodeList_Service_Request_Item_Notes.xlsx
  69. BIN
      Technical Support/CodeList/CodeList_Service_Request_Location_Address.xlsx
  70. BIN
      Technical Support/CodeList/CodeList_Service_Request_Notes.xlsx
  71. BIN
      Technical Support/CodeList/CodeList_Service_Request_Other_Party.xlsx
  72. BIN
      Technical Support/CodeList/CodeList_Service_Request_Party.xlsx
  73. BIN
      Technical Support/CodeList/CodeList_Service_Request_Solution_Proposal.xlsx
  74. BIN
      Technical Support/CodeList/~$CodeList_Service_Request.xlsx
  75. BIN
      Technical Support/Product Classification_TecSup Origin_Type of TecSup.xlsx
  76. BIN
      Technical Support/Technical Support Advanced Find View CN.xls
  77. BIN
      Technical Support/Technical Support Advanced Find View CZ.xls
  78. BIN
      Technical Support/Technical Support Advanced Find View DE.xls
  79. BIN
      Technical Support/Technical Support Advanced Find View ES.xls
  80. BIN
      Technical Support/Technical Support Advanced Find View IT.xls
  81. BIN
      Technical Support/Technical Support Advanced Find View NL.xls
  82. BIN
      Technical Support/Technical Support Advanced Find View NO.xls
  83. BIN
      Technical Support/Technical Support Advanced Find View PL.xls
  84. 3
      Tranlation from MS CRM/CrmTranslations.xml
  85. BIN
      Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214.zip
  86. 3
      Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214/CrmTranslations.xml
  87. 1
      Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214/[Content_Types].xml
  88. 1
      Tranlation from MS CRM/[Content_Types].xml
  89. BIN
      accounts/CodeList/CodeList_Account.xlsx
  90. BIN
      accounts/CodeList/CodeList_Account_Addresses.xlsx
  91. BIN
      accounts/CodeList/CodeList_Account_Contact_Persons.xlsx
  92. BIN
      accounts/CodeList/CodeList_Account_Identification.xlsx
  93. BIN
      accounts/CodeList/CodeList_Account_International_Version.xlsx
  94. BIN
      accounts/CodeList/CodeList_Account_Sales_Data.xlsx
  95. BIN
      accounts/CodeList/CodeList_Account_Tax_Numbers.xlsx
  96. BIN
      accounts/CodeList/CodeList_Account_Team.xlsx
  97. BIN
      accounts/CodeList/CodeList_Account_Visiting_Hours.xlsx
  98. BIN
      accounts/CodeList/CodeList_Account_Visits_Details.xlsx
  99. BIN
      accounts/Detailled Field Mapping Account 20210701.xlsx
  100. BIN
      accounts/Line of Business translation -TEMPLATE-V1.xlsx

BIN
Account/Account Advanced Find View CN.xls

Binary file not shown.

BIN
Account/Account Advanced Find View CZ.xls

Binary file not shown.

BIN
Account/Account Advanced Find View DE.xls

Binary file not shown.

BIN
Account/Account Advanced Find View ES.xls

Binary file not shown.

BIN
Account/Account Advanced Find View FI.xls

Binary file not shown.

BIN
Account/Account Advanced Find View IT.xls

Binary file not shown.

BIN
Account/Account Advanced Find View NL.xls

Binary file not shown.

BIN
Account/Account Advanced Find View NO.xls

Binary file not shown.

BIN
Account/Account Advanced Find View PL.xls

Binary file not shown.

BIN
Account/CodeList/CodeList_Account.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Addresses.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Contact_Persons.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Identification.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_International_Version.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Sales_Data.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Tax_Numbers.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Team.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Visiting_Hours.xlsx

Binary file not shown.

BIN
Account/CodeList/CodeList_Account_Visits_Details.xlsx

Binary file not shown.

BIN
Account/Line of Business translation -TEMPLATE-V1.xlsx

Binary file not shown.

365
Accounts.Rmd

@ -0,0 +1,365 @@
---
title: "Accounts"
author: "Scary Scarecrow"
date: "1/10/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(dplyr)
library(lubridate)
library(DT)
library(tidyr)
mutlstxlrdr<-function(){
for( i in seq_along(sheet.na)){
colnames<-unique(saptemplate[saptemplate$`Sheet Name`==snames[i],]$Header)
df<-read.table("", col.names = colnames)
assign(snames[i], df)
}
}
```
## Data transformation workflow
Following is the proposed preliminary workflow for the data transformation project.
>All file of a segment (contacts/accounts etc..) should be inside the relevant folder. Each folder should have one folder for all codelist files. All legacy data (one file for each country) should be inside the raw-data folder, named after each country. Another file having field definitions including name of the matching column from the legacy file should also be there.
>*Make sure that there are no hidden files inside the directory.*
### Code Lists
```{r Create List of Files, echo=TRUE, message=FALSE, warning=FALSE}
filenames <- list.files("./accounts/CodeList", pattern="*.xlsx", full.names = T) # We can avoid creating a separate directory for code list. But organizing may be difficult. However, this can be explored further if we want transform all the data in one go i.e. not by functions (contacts, accounts etc.).
# File paths
print(filenames)
```
Check manually if the above list includes all the codelist files
If correct, then read the files.
```{r codelistreader, echo=TRUE, message=FALSE, warning=FALSE}
sheet_names<-lapply(filenames, excel_sheets) # Creates a list of the sheet names
codelist_files<-NULL
for(i in seq_along(filenames)){
a<-lapply(excel_sheets(filenames[[i]]), read_excel, path = filenames[[i]], col_types = "text") # Reads the sheets of the excel files
names(a)<-c(sheet_names[[i]]) # Renames them according to the sheet names extracted above
codelist_files<-c(codelist_files,a)
}
# Names of the files imported
names(codelist_files)
#codelist_files<-unique(codelist_files)
codelist_files$Customer_type_I
```
### Templates
Let us now extract the data. Below we are reading only one file having all data related to `Contacts` from the legacy system.
```{r readlegacyfilepath, echo=TRUE, message=FALSE, warning=FALSE}
oldfilepath<-list.files("./accounts/raw-data", pattern="*.xls", full.names = T) # Change the path, check pattern
print(oldfilepath)
```
Check it the list matches the actual files, manually.
```{r readlegacyfiles, echo=TRUE}
old_files<-NULL
#read_excel(path = oldfilepath[[i]], sheet = 1)
for(i in seq_along(oldfilepath)){
old_files[[i]]<-read_excel(path = oldfilepath[[i]], sheet = 1)
}
old_files
names(old_files)<-gsub("./accounts/raw-data/","",oldfilepath) # Change path
```
*Some errors in the legacy file noticed. Columns with similar or same name exists.*
```{r readSAPtemplate, echo=TRUE, message=FALSE, warning=FALSE}
saptemplate<-read_excel("./accounts/template.xlsx", sheet = "Field_Definitions")
# First few rows of the imported data
head(saptemplate)
```
*Please note that the format of the tables (sheet) has been slightly changed. Earlier the corresponding sheet name was mentioned in a row before the actual table. Now, all the rows mention the corresponding sheet name. This was done manually for convenience of data extraction*
## Don't have Status column defined
## There could be issue in line of business
```{r createmptySAPfiles, echo=TRUE, message=FALSE, warning=FALSE}
#orilo<-"en_US.UTF-8"
#Sys.setlocale(locale="en_US.UTF-8")
strt<-Sys.time()
snames <- unique(saptemplate$`Sheet Name`)
for (h in seq_along(old_files)) {
# Copy original data
old.copy <- old_files[[h]]
print(paste0(names(old_files[h])," imported"))
err.summ<-data.frame(Country=NULL, Name=NULL, Expected=NULL, Actual=NULL) #Error Cal
# Creates data frame for each sheet in snames
for (i in seq_along(snames)) {
print(paste0("Processing ..",snames[i]))
# Select the column names from the field description sheet
print("Creating template")
sel.template.desc <-
saptemplate[saptemplate$`Sheet Name` == snames[i], ]
print("Creating column names")
sel.template.desc.colnames <- sel.template.desc$Header
# Create a list by adding values from corresponding legacy data
temp <- NULL
print("adding values to template ")
for (j in seq_along(sel.template.desc.colnames)) {
temp[j] <-ifelse(sel.template.desc$oldkey[j]=="NA" | is.na(sel.template.desc$oldkey[j]),
NA,as.vector(old.copy[, sel.template.desc$oldkey[j]])
)
}
# Rename the columns according to field description
print("renaming template ")
names(temp) <- sel.template.desc.colnames
# Create data frame from the list
df <- as.data.frame(temp)
print("Converted to data frame")
# Error summary file
Expected<-nrow(df)
#Select essential rows
print("Identifying essential rows")
sel.template.desc |>
filter(Mandatory == "Yes") |>
pull(Header) -> essential.columns
error.mandatory <- NULL
error.df<-data.frame(Country=NULL, Name=NULL, Rows=NULL, Expected=NULL)
# Operate on essential columns including creation of error file
for (k in seq_along(essential.columns)) { # In case there are any default values (of mandatory) they need to be added here
if(essential.columns[k]=="International_Version"){
print("Found International Version. Adding 0.")
df$International_Version<-"0"
}
print("Creating and writing data with missing mandatory values")
assign(
paste0(
"error_mandatory_",
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
essential.columns[k]
),
df[is.na(df[, essential.columns[k]]), ]
)
# TO be saved in error files
if(nrow(df[is.na(df[, essential.columns[k]]), ])>0){
write.csv(
df[is.na(df[, essential.columns[k]]), ],
paste0(
"./acounts/errors/mandatory/", #Change path
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
essential.columns[k],
"_error_mandatory.csv"
), row.names = F, na=""
)
}
# Error summary file
Country<-substr(names(old_files[h]), 2, 3)
Name<-snames[i]
err.type<-paste0("Missing ",essential.columns[k])
err.count<-nrow(df[is.na(df[, essential.columns[k]]), ])
print("Removing rows with empty essetial columns")
df <- df[!is.na(df[, essential.columns[k]]), ]
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
print("Identifying columns associated with codelists")
# List of columns that have a codelist
codelistcols <- sel.template.desc |>
filter(!is.na(`CodeList File Path`)) |> pull(Header)
for (k in seq_along(codelistcols)) {
print(paste0("Identifying errors ",codelistcols[k]))
def.rows <-
which(!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA))
def.n<- df[def.rows, 1]
def.rows.val <-
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]]
def <- data.frame(def.rows, def.n,def.rows.val)
if(nrow(def>0)){
assign(paste0(
"error_codematch_",
substr(names(old_files[1]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k]
),
def) # TO be saved
write.csv(
def,
paste0(
"./accounts/errors/codelist/", #Change path
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
codelistcols[k],
"_error_codematch_.csv"
), row.names = F, na=""
)
}
err.type<-paste0("Codelist Mismatch ", codelistcols[k]) #Error cal
err.count<-nrow(def) #Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
print(paste0("Removing errors ",codelistcols[k]))
# Removes any mismatch
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]] <-
NA
# Matches each column with the corresponding code list and returns the value
df[, codelistcols[k]] <-
pull(codelist_files[codelistcols[k]][[1]], 2)[match(pull(df, codelistcols[k]),
pull(codelist_files[codelistcols[k]][[1]], Description))]
}
max.length <- as.numeric(sel.template.desc$`Max Length`)
dtype <- sel.template.desc$`Data Type`
rowval <- NULL
ival <- NULL
rval <- NULL
lenght.issue.df <- NULL
# Changing the data class
for (k in 1:ncol(df)) {
if (dtype[k] == "String") {
df[, k] <- as.character(pull(df, k))
}
if (dtype[k] == "Boolean") {
df[, k] <- as.logical(pull(df, k))
}
if (dtype[k] == "DateTime") {
df[, k] <- lubridate::ymd_hms(pull(df, k))
}
if (dtype[k] == "Time") {
df[, k] <- lubridate::hms(pull(df, k))
} # This list will increase and also change based on input date and time formats
}
print("Rectifying streetname")
# Street and House Number
if (any(colnames(df) == "Street")) {
df$Streetname<-NA
df$HouseNumber<-NA
# Separates streetname and housenumber
extract(df,
"Street",
c("Streetname", "HouseNumber"),
"(\\D+)(\\d.*)")
df <- df |>
select(-c("Street", "House_Number")) |>
rename(Street = Streetname, House_Number = HouseNumber) |>
select(sel.template.desc.colnames)
}
# Length Rectification
colclasses <- lapply(df, class)
print("Rectifying Length")
for (k in 1:ncol(df)) {
if (colclasses[[k]] == "character") {
print("found character column ")
rowval <- pull(df, 1)
ival <- ifelse(nchar(pull(df, k))== 0 | is.na(nchar(pull(df, k))),1,nchar(pull(df, k)))
rval <- max.length[k]
# rectifying data length
df[, k] <-
ifelse(nchar(pull(df, k)) > max.length[k],
substring(pull(df, k), 1, max.length[k]),
pull(df, k))
}
lenght.issue.df <-
rbind(lenght.issue.df, data.frame(rowval, ival, rval))
err.type<- paste0("Length error ", colnames(df)[k]) # Error cal
err.count<- sum(ival>rval, na.rm = T) # Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
lenght.issue.df <- dplyr::filter(lenght.issue.df,ival>rval)
if(nrow(lenght.issue.df)>0){
write.csv(lenght.issue.df,
paste0(
"./accounts/errors/length/", # Change path
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_length_error.csv"
), row.names = F, na="")
}
assign(snames[i], df)
write.csv(df,paste0("./acounts/output/", substr(names(old_files[h]), 2, 3), "_", snames[i],".csv"), row.names = F, na="") #Chnage path
if(nrow(error.df)>0){
write.csv(error.df, paste0("./contacts/summary/",substr(names(old_files[h]), 2, 3), "_", snames[i],"_error",".csv"), row.names = F, na="") # Error write
}
err.summ<-rbind(err.summ,data.frame(Country=Country, Name=Name, Expected=Expected, Actual=nrow(df))) #Error Cal
}
write.csv(err.summ,
paste0("./contacts/summary/" ,substr(names(old_files[h]), 2, 3), "_", snames[i],"_sumerror",".csv"), row.names = F, na="") # Error Write
}
end<-Sys.time()
end-strt
```
*The code failed because Department Column appears several times in the data and while importing R renamed them to Department..xx).*
*Manually verify if these are the required templates*

18
Contact.csv

@ -0,0 +1,18 @@
"External_Key","Contact_ID","Status","Title","Academic_Title","Additional_Academic_Title","Prefix","First_Name","Last_Name","Additional_Last_Name","Initials","Middle_Name","Gender","Marital_Status","Language","Nick_Name","Date_of_Birth","Birth_Name","Contact_Permission","Profession","Perception_Of_Company","Account_External_Key","Account_ID","Building","Floor","Room","Job_Title","Function","Department","Department_From_Business_Card","VIP_Contact","Phone","Mobile","Fax","EMail","EMail_Invalid","Best_Reached_By","CountryRegion","Street","City","Postal_Code","State","Contact_Owner_External_Key","Contact_Owner_ID","Former_CRM_reference","House_Number","State_Text_Updatable"
"98320","F2371","2","0002",NA,"0004","0001",NA,"qefb",NA,"D",NA,NA,NA,NA,NA,NA,NA,"1",NA,"01","nnfknwei","njljenf","1",NA,NA,NA,"0001","0001",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98322","F2373",NA,NA,NA,"0001","0003","Jojqfn","uqheq","asdvjn",NA,NA,NA,NA,"DE",NA,NA,NA,"3",NA,"03",NA,NA,"3",NA,NA,NA,"0003","0003",NA,NA,NA,NA,NA,NA,NA,"INT",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98324","F2375",NA,NA,"0003",NA,"0005","jenwv","kuhanbbw","ajvn",NA,"qjebofb",NA,NA,"ES",NA,NA,NA,NA,NA,NA,NA,NA,"5",NA,NA,"wevne","0005","0005",NA,NA,NA,NA,NA,NA,NA,"TEL",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98327","F2378",NA,"0002",NA,NA,"0008","wjvnjwnef","wjnweg",NA,"I","sjdvnw",NA,NA,"NL",NA,NA,NA,NA,NA,"02",NA,NA,"8",NA,NA,"aeb","0008","0008",NA,"C",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"BGL",NA,NA,NA,NA,NA
"98329","F2380",NA,NA,"0005","0003","0010",NA,"ejavneq","jsdnw","J","wienw",NA,NA,"ZH",NA,NA,NA,NA,NA,"01",NA,"wejgnkjlqe","10",NA,NA,NA,"0010","0010",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"FRA",NA,NA,NA,NA,NA
"98332","F2383",NA,NA,NA,NA,"0013",NA,"jviwef",NA,"NU","wjbwv",NA,NA,NA,NA,NA,NA,"2",NA,NA,"wejnfwjg","weignwgw","13",NA,NA,"ertbgewb","0013","0013",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"GHO",NA,NA,NA,NA,NA
"98333","F2384",NA,NA,NA,NA,"0014","qwejfnv","jnbwon","wsebhjuw","IE","wjgniwg",NA,NA,NA,NA,NA,NA,"3",NA,NA,NA,NA,"14",NA,NA,NA,"0014","0014",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"HEL",NA,NA,NA,NA,NA
"98336","F2387",NA,NA,"0001","0003","0017","qejfjv","wjbnjnw","wejbwe","J","wehbwef",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"weojgqwegn","17",NA,NA,NA,"0017","0017",NA,NA,NA,NA,NA,NA,NA,"FAX",NA,NA,NA,NA,"KAB",NA,NA,NA,NA,NA
"98337","F2388",NA,NA,NA,"0001","0018",NA,"svnjwne",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"18",NA,NA,"wfbefb","0018","0018",NA,NA,NA,NA,NA,NA,NA,"INT",NA,NA,NA,NA,"KAN",NA,NA,NA,NA,NA
"98338","F2389","2",NA,NA,NA,"0019","qsvjbj","ijwegno","hwegbjwe","J","wejbiwq",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"19",NA,NA,NA,"0019","0019",NA,NA,NA,NA,NA,NA,NA,"LET",NA,NA,NA,NA,"KAP",NA,NA,NA,NA,NA
"98340","F2391",NA,NA,"0005",NA,"0021","kavjbjleq","dnbw","wejbwe",NA,"wejnw",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"21",NA,NA,NA,"0021","0021",NA,NA,NA,NA,NA,NA,NA,"VIS",NA,NA,NA,NA,"KHO",NA,NA,NA,NA,NA
"98341","F2392","2",NA,"0006",NA,NA,NA,"sjenw","wejfbiwef","JJ",NA,NA,NA,NA,NA,NA,NA,"1",NA,NA,NA,NA,"22",NA,NA,NA,"0022","0022",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"KNR",NA,NA,NA,NA,NA
"98343","F2394",NA,"0001",NA,"0003","0024","asjvnef","sefnjwe",NA,"JEI","wejnet",NA,NA,NA,NA,NA,NA,"3",NA,NA,NA,"ergnerg","24",NA,NA,NA,"0024","0024",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98344","F2395","2",NA,NA,"0001","0025",NA,"wejbwee","wejhbwef",NA,"wjgb",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"25",NA,NA,NA,"0025",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98346","F2397",NA,NA,"0003",NA,NA,NA,"jevwbi","wejbubvw",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"27",NA,NA,NA,"0027",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98349","F2400",NA,"0001",NA,"0003",NA,NA,"asvbwe","wefjnbwe",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"98350","F2401",NA,NA,NA,"0001",NA,NA,"jasbv",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
1 External_Key Contact_ID Status Title Academic_Title Additional_Academic_Title Prefix First_Name Last_Name Additional_Last_Name Initials Middle_Name Gender Marital_Status Language Nick_Name Date_of_Birth Birth_Name Contact_Permission Profession Perception_Of_Company Account_External_Key Account_ID Building Floor Room Job_Title Function Department Department_From_Business_Card VIP_Contact Phone Mobile Fax EMail EMail_Invalid Best_Reached_By CountryRegion Street City Postal_Code State Contact_Owner_External_Key Contact_Owner_ID Former_CRM_reference House_Number State_Text_Updatable
2 98320 F2371 2 0002 NA 0004 0001 NA qefb NA D NA NA NA NA NA NA NA 1 NA 01 nnfknwei njljenf 1 NA NA NA 0001 0001 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 98322 F2373 NA NA NA 0001 0003 Jojqfn uqheq asdvjn NA NA NA NA DE NA NA NA 3 NA 03 NA NA 3 NA NA NA 0003 0003 NA NA NA NA NA NA NA INT NA NA NA NA NA NA NA NA NA NA
4 98324 F2375 NA NA 0003 NA 0005 jenwv kuhanbbw ajvn NA qjebofb NA NA ES NA NA NA NA NA NA NA NA 5 NA NA wevne 0005 0005 NA NA NA NA NA NA NA TEL NA NA NA NA NA NA NA NA NA NA
5 98327 F2378 NA 0002 NA NA 0008 wjvnjwnef wjnweg NA I sjdvnw NA NA NL NA NA NA NA NA 02 NA NA 8 NA NA aeb 0008 0008 NA C NA NA NA NA NA NA NA NA NA NA BGL NA NA NA NA NA
6 98329 F2380 NA NA 0005 0003 0010 NA ejavneq jsdnw J wienw NA NA ZH NA NA NA NA NA 01 NA wejgnkjlqe 10 NA NA NA 0010 0010 NA NA NA NA NA NA NA NA NA NA NA NA FRA NA NA NA NA NA
7 98332 F2383 NA NA NA NA 0013 NA jviwef NA NU wjbwv NA NA NA NA NA NA 2 NA NA wejnfwjg weignwgw 13 NA NA ertbgewb 0013 0013 NA NA NA NA NA NA NA NA NA NA NA NA GHO NA NA NA NA NA
8 98333 F2384 NA NA NA NA 0014 qwejfnv jnbwon wsebhjuw IE wjgniwg NA NA NA NA NA NA 3 NA NA NA NA 14 NA NA NA 0014 0014 NA NA NA NA NA NA NA NA NA NA NA NA HEL NA NA NA NA NA
9 98336 F2387 NA NA 0001 0003 0017 qejfjv wjbnjnw wejbwe J wehbwef NA NA NA NA NA NA NA NA NA NA weojgqwegn 17 NA NA NA 0017 0017 NA NA NA NA NA NA NA FAX NA NA NA NA KAB NA NA NA NA NA
10 98337 F2388 NA NA NA 0001 0018 NA svnjwne NA NA NA NA NA NA NA NA NA NA NA NA NA NA 18 NA NA wfbefb 0018 0018 NA NA NA NA NA NA NA INT NA NA NA NA KAN NA NA NA NA NA
11 98338 F2389 2 NA NA NA 0019 qsvjbj ijwegno hwegbjwe J wejbiwq NA NA NA NA NA NA NA NA NA NA NA 19 NA NA NA 0019 0019 NA NA NA NA NA NA NA LET NA NA NA NA KAP NA NA NA NA NA
12 98340 F2391 NA NA 0005 NA 0021 kavjbjleq dnbw wejbwe NA wejnw NA NA NA NA NA NA NA NA NA NA NA 21 NA NA NA 0021 0021 NA NA NA NA NA NA NA VIS NA NA NA NA KHO NA NA NA NA NA
13 98341 F2392 2 NA 0006 NA NA NA sjenw wejfbiwef JJ NA NA NA NA NA NA NA 1 NA NA NA NA 22 NA NA NA 0022 0022 NA NA NA NA NA NA NA NA NA NA NA NA KNR NA NA NA NA NA
14 98343 F2394 NA 0001 NA 0003 0024 asjvnef sefnjwe NA JEI wejnet NA NA NA NA NA NA 3 NA NA NA ergnerg 24 NA NA NA 0024 0024 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
15 98344 F2395 2 NA NA 0001 0025 NA wejbwee wejhbwef NA wjgb NA NA NA NA NA NA NA NA NA NA NA 25 NA NA NA 0025 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
16 98346 F2397 NA NA 0003 NA NA NA jevwbi wejbubvw NA NA NA NA NA NA NA NA NA NA NA NA NA 27 NA NA NA 0027 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
17 98349 F2400 NA 0001 NA 0003 NA NA asvbwe wefjnbwe NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
18 98350 F2401 NA NA NA 0001 NA NA jasbv NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

402
Contacts.Rmd

@ -0,0 +1,402 @@
---
title: "Contacts"
author: "Scary Scarecrow"
date: "12/27/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(dplyr)
library(lubridate)
library(DT)
library(tidyr)
mutlstxlrdr<-function(){
for( i in seq_along(sheet.na)){
colnames<-unique(saptemplate[saptemplate$`Sheet Name`==snames[i],]$Header)
df<-read.table("", col.names = colnames)
assign(snames[i], df)
}
}
```
## Data transformation workflow
Following is the proposed preliminary workflow for the data transformation project.
>All file of a segment (contacts/accounts etc..) should be inside the relevant folder. Each folder should have one folder for all codelist files. All legacy data (one file for each country) should be inside the raw-data folder, named after each country. Another file having field definitions including name of the matching column from the legacy file should also be there.
>*Make sure that there are no hidden files inside the directory.*
### Code Lists
```{r Create List of Files, echo=TRUE, message=FALSE, warning=FALSE}
filenames <- list.files("./contacts/CodeList", pattern="*.xlsx", full.names = T) # We can avoid creating a separate directory for code list. But organizing may be difficult. However, this can be explored further if we want transform all the data in one go i.e. not by functions (contacts, accounts etc.).
# File paths
print(filenames)
```
Check manually if the above list includes all the codelist files
If correct, then read the files.
```{r codelistreader, echo=TRUE, message=FALSE, warning=FALSE}
sheet_names<-lapply(filenames, excel_sheets) # Creates a list of the sheet names
codelist_files<-NULL
for(i in seq_along(filenames)){
a<-lapply(excel_sheets(filenames[[i]]), read_excel, path = filenames[[i]], col_types = "text") # Reads the sheets of the excel files
names(a)<-c(sheet_names[[i]]) # Renames them according to the sheet names extracted above
codelist_files<-c(codelist_files,a)
}
# Names of the files imported
names(codelist_files)
#codelist_files<-unique(codelist_files)
codelist_files$Academic_Title
```
### Templates
Let us now extract the data. Below we are reading only one file having all data related to `Contacts` from the legacy system.
```{r readlegacyfilepath, echo=TRUE, message=FALSE, warning=FALSE}
oldfilepath <- list.files("./contacts/raw-data/", pattern="*.xlsx", full.names = T)
print(oldfilepath)
```
Check it the list matches the actual files, manually.
```{r readlegacyfiles, echo=TRUE}
old_files<-NULL
#read_excel(path = oldfilepath[[i]], sheet = 1)
for(i in seq_along(oldfilepath)){
old_files[[i]]<-read_excel(path = oldfilepath[[i]], sheet = 1)
}
names(old_files)<-gsub("./contacts/raw-data/","",oldfilepath)
```
*Some errors in the legacy file noticed. Columns with similar or same name exists.*
```{r readSAPtemplate, echo=TRUE, message=FALSE, warning=FALSE}
saptemplate<-read_excel("./contacts/template.xlsx", sheet = "Field_Definitions")
# First few rows of the imported data
head(saptemplate)
```
*Please note that the format of the tables (sheet) has been slightly changed. Earlier the corresponding sheet name was mentioned in a row before the actual table. Now, all the rows mention the corresponding sheet name. This was done manually for convenience of data extraction*
```{r createmptySAPfiles, echo=TRUE, message=FALSE, warning=FALSE}
#orilo<-"en_US.UTF-8"
#Sys.setlocale(locale="en_US.UTF-8")
strt<-Sys.time()
snames <- unique(saptemplate$`Sheet Name`)
for (h in seq_along(old_files)) {
# Copy original data
old.copy <- old_files[[h]]
print(paste0(names(old_files[h])," imported"))
err.summ<-data.frame(Country=NULL, Name=NULL, Expected=NULL, Actual=NULL) #Error Cal
# Creates data frame for each sheet in snames
for (i in seq_along(snames)) {
print(paste0("Processing ..",snames[i]))
# Select the column names from the field description sheet
print("Creating template")
sel.template.desc <-
saptemplate[saptemplate$`Sheet Name` == snames[i], ]
print("Creating column names")
sel.template.desc.colnames <- sel.template.desc$Header
# Create a list by adding values from corresponding legacy data
temp <- NULL
print("adding values to template ")
for (j in seq_along(sel.template.desc.colnames)) {
temp[j] <-ifelse(sel.template.desc$oldkey[j]=="NA" | is.na(sel.template.desc$oldkey[j]),
NA,as.vector(old.copy[, sel.template.desc$oldkey[j]])
)
}
# Rename the columns according to field description
print("renaming template ")
names(temp) <- sel.template.desc.colnames
# Create data frame from the list
df <- as.data.frame(temp)
print("Converted to data frame")
# Error summary file
Expected<-nrow(df)
#Select essential rows
print("Identifying essential rows")
sel.template.desc |>
filter(Mandatory == "Yes") |>
pull(Header) -> essential.columns
error.mandatory <- NULL
error.df<-data.frame(Country=NULL, Name=NULL, Rows=NULL, Expected=NULL)
# Operate on essential columns including creation of error file
for (k in seq_along(essential.columns)) {
if(essential.columns[k]=="International_Version"){
print("Found International Version. Adding 0.")
#stop()
df$International_Version<-"0"
}
print("Creating and writing data with missing mandatory values")
assign(
paste0(
"error_mandatory_",
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
essential.columns[k]
),
df[is.na(df[, essential.columns[k]]), ]
)
# TO be saved in error files
if(nrow(df[is.na(df[, essential.columns[k]]), ])>0){
write.csv(
df[is.na(df[, essential.columns[k]]), ],
paste0(
"./contacts/errors/mandatory/",
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
essential.columns[k],
"_error_mandatory.csv"
), row.names = F, na=""
)
}
# Error summary file
Country<-substr(names(old_files[h]), 2, 3)
Name<-snames[i]
err.type<-paste0("Missing ",essential.columns[k])
err.count<-nrow(df[is.na(df[, essential.columns[k]]), ])
print("Removing rows with empty essetial columns")
df <- df[!is.na(df[, essential.columns[k]]), ]
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
print("Identifying columns associated with codelists")
# List of columns that have a codelist
codelistcols <- sel.template.desc |>
filter(!is.na(`CodeList File Path`)) |> pull(Header)
for (k in seq_along(codelistcols)) {
if(codelistcols[k]=="International_Version"){
print("Found International Version. Adding 0.")
df$International_Version<-"0"
}
print(paste0("Identifying errors ",codelistcols[k]))
def.rows <-
which(!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA))
def.n<- df[def.rows, 1]
def.rows.val <-
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]]
def <- data.frame(def.rows, def.n,def.rows.val)
if(nrow(def>0)){
assign(paste0(
"error_codematch_",
substr(names(old_files[1]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k]
),
def) # TO be saved
write.csv(
def,
paste0(
"./contacts/errors/codelist/",
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_",
codelistcols[k],
"_error_codematch_.csv"
), row.names = F, na=""
)
}
err.type<-paste0("Codelist Mismatch ", codelistcols[k]) #Error cal
err.count<-nrow(def) #Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
print(paste0("Removing errors ",codelistcols[k]))
# Removes any mismatch
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]] <-
NA
# Matches each column with the corresponding code list and returns the value
df[, codelistcols[k]] <-
as.character(pull(codelist_files[codelistcols[k]][[1]], 2)[match(pull(df, codelistcols[k]),
pull(codelist_files[codelistcols[k]][[1]], Description))])
}
max.length <- as.numeric(sel.template.desc$`Max Length`)
dtype <- sel.template.desc$`Data Type`
rowval <- NULL
ival <- NULL
rval <- NULL
lenght.issue.df <- NULL
# Changing the data class
for (k in 1:ncol(df)) {
if (dtype[k] == "String") {
df[, k] <- as.character(pull(df, k))
}
if (dtype[k] == "Boolean") {
df[, k] <- as.logical(pull(df, k))
}
if (dtype[k] == "DateTime") {
df[, k] <- lubridate::ymd_hms(pull(df, k))
}
if (dtype[k] == "Time") {
df[, k] <- lubridate::hms(pull(df, k))
} # This list will increase and also change based on input date and time formats
}
print("Rectifying streetname")
# Street and House Number
if (any(colnames(df) == "Street")) {
df$Streetname<-NA
df$HouseNumber<-NA
extract(df,
"Street",
c("Streetname", "HouseNumber"),
"(\\D+)(\\d.*)")
df <- df |>
select(-c("Street", "House_Number")) |>
rename(Street = Streetname, House_Number = HouseNumber) |>
select(all_of(sel.template.desc.colnames))
}
# Length Rectification
colclasses <- lapply(df, class)
print("Rectifying Length")
for (k in 1:ncol(df)) {
if (colclasses[[k]] == "character") {
print("found character column ")
rowval <- pull(df, 1)
ival <- ifelse(nchar(pull(df, k))== 0 | is.na(nchar(pull(df, k))),1,nchar(pull(df, k)))
rval <- max.length[k]
# rectifying data length
df[, k] <-
ifelse(nchar(pull(df, k)) > max.length[k],
substring(pull(df, k), 1, max.length[k]),
pull(df, k))
}
lenght.issue.df <-
rbind(lenght.issue.df, data.frame(rowval, ival, rval))
err.type<- paste0("Length error ", colnames(df)[k]) # Error cal
err.count<- sum(ival>rval, na.rm = T) # Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
lenght.issue.df <- dplyr::filter(lenght.issue.df,ival>rval)
if(nrow(lenght.issue.df)>0){
write.csv(lenght.issue.df,
paste0(
"./contacts/errors/length/",
substr(names(old_files[h]), 2, 3),
"_",
snames[i],
"_length_error.csv"
), row.names = F, na="")
}
assign(snames[i], df)
write.csv(df,paste0("./contacts/output/", substr(names(old_files[h]), 2, 3), "_", snames[i],".csv"), row.names = F, na="")
if(nrow(error.df)>0){
write.csv(error.df, paste0("./contacts/summary/",substr(names(old_files[h]), 2, 3), "_", snames[i],"_error",".csv"), row.names = F, na="") # Error write
}
err.summ<-rbind(err.summ,data.frame(Country=Country, Name=Name, Expected=Expected, Actual=nrow(df))) #Error Cal
}
write.csv(err.summ,
paste0("./contacts/summary/" ,substr(names(old_files[h]), 2, 3), "_", snames[i],"_sumerror",".csv"), row.names = F, na="") # Error Write
}
end<-Sys.time()
end-strt
```
*The code failed because Department Column appears several times in the data and while importing R renamed them to Department..xx).*
*Manually verify if these are the required templates*

13
DataTransformationCRH.Rproj

@ -0,0 +1,13 @@
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX

BIN
Opportunity Mapping 20210714.xlsx

Binary file not shown.

BIN
Opportunity mapping AX Building Sites 20210924.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Contact_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Account_Team_Party_Information_Deprecated_.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Competitor_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_External_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Header_Revenue_Plan.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Item_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Notes.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Other_Party_Information_Deprecated_.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Preceding_and_Follow_Up_Documents.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Product.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Product_Notes.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Product_Quantity_Plan.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Product_Revenue_Plan.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Prospect_Contact_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Revenue_Splits.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Sales_Employee_Party_Information_Deprecated_.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Sales_Partner_Party_Information_Deprecated_.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/CodeList_Opportunity_Sales_Team_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList/~$CodeList_Opportunity.xlsx

Binary file not shown.

BIN
Project_oppt/CodeList_Opportunity_Sales_Team_Party_Information.xlsx

Binary file not shown.

BIN
Project_oppt/Mapping Rules Opportunities AX to C4.xlsx

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View 2021-12-22 09_47_35Z.xml

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View 2021-12-22 09_50_38Z NL.xml

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View 2021-12-22 09_51_49Z PL.xml

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View CZ.xls

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View DE.xls

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View ES.xls

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View IT.xls

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View NL.xls

Binary file not shown.

BIN
Project_oppt/Project Advanced Find View PL.xls

Binary file not shown.

373
Projects.Rmd

@ -0,0 +1,373 @@
---
title: "Projects"
author: "Scary Scarecrow"
date: "1/12/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(dplyr)
library(lubridate)
library(DT)
library(tidyr)
mutlstxlrdr<-function(){
for( i in seq_along(sheet.na)){
colnames<-unique(saptemplate[saptemplate$`Sheet Name`==snames[i],]$Header)
df<-read.table("", col.names = colnames)
assign(snames[i], df)
}
}
```
## Data transformation workflow
Following is the proposed preliminary workflow for the data transformation project.
>All file of a segment (contacts/accounts etc..) should be inside the relevant folder. Each folder should have one folder for all codelist files. All legacy data (one file for each country) should be inside the raw-data folder, named after each country. Another file having field definitions including name of the matching column from the legacy file should also be there.
>*Make sure that there are no hidden files inside the directory.*
### Code Lists
```{r Create List of Files, echo=TRUE, message=FALSE, warning=FALSE}
filenames <- list.files("./projects/CodeList", pattern="*.xlsx", full.names = T) # We can avoid creating a separate directory for code list. But organizing may be difficult. However, this can be explored further if we want transform all the data in one go i.e. not by functions (contacts, accounts etc.).
# File paths
print(filenames)
```
Check manually if the above list includes all the codelist files
If correct, then read the files.
```{r codelistreader, echo=TRUE, message=FALSE, warning=FALSE}
sheet_names<-lapply(filenames, excel_sheets) # Creates a list of the sheet names
codelist_files<-NULL
for(i in seq_along(filenames)){
a<-lapply(excel_sheets(filenames[[i]]), read_excel, path = filenames[[i]], col_types = "text") # Reads the sheets of the excel files
names(a)<-c(sheet_names[[i]]) # Renames them according to the sheet names extracted above
codelist_files<-c(codelist_files,a)
}
# Names of the files imported
names(codelist_files)
#codelist_files<-unique(codelist_files)
codelist_files$Academic_Title
```
### Templates
Let us now extract the data. Below we are reading only one file having all data related to `Contacts` from the legacy system.
```{r readlegacyfilepath, echo=TRUE, message=FALSE, warning=FALSE}
oldfilepath<-list.files("./projects/raw-data", pattern="*.xls", full.names = T) # Change the path, check pattern
print(oldfilepath)
```
Check it the list matches the actual files, manually.
```{r readlegacyfiles, echo=TRUE}
old_files<-NULL
#read_excel(path = oldfilepath[[i]], sheet = 1)
for(i in seq_along(oldfilepath)){
old_files[[i]]<-read_excel(path = oldfilepath[[i]], sheet = 1)
}
names(old_files)<-gsub("./projects/raw-data/","",oldfilepath)
```
*Some errors in the legacy file noticed. Columns with similar or same name exists.*
```{r readSAPtemplate, echo=TRUE, message=FALSE, warning=FALSE}
saptemplate<-read_excel("./projects/template.xlsx", sheet = "Field_Definitions")
# First few rows of the imported data
head(saptemplate)
```
*Please note that the format of the tables (sheet) has been slightly changed. Earlier the corresponding sheet name was mentioned in a row before the actual table. Now, all the rows mention the corresponding sheet name. This was done manually for convenience of data extraction*
```{r createmptySAPfiles, echo=TRUE, message=FALSE, warning=FALSE}
#orilo<-"en_US.UTF-8"
#Sys.setlocale(locale="en_US.UTF-8")
strt<-Sys.time()
snames <- unique(saptemplate$`Sheet Name`)
for (h in seq_along(old_files)) {
# Copy original data
old.copy <- old_files[[h]]
print(paste0(names(old_files[h])," imported"))
err.summ<-data.frame(Country=NULL, Name=NULL, Expected=NULL, Actual=NULL) #Error Cal
# Creates data frame for each sheet in snames
for (i in seq_along(snames)) {
print(paste0("Processing ..",snames[i]))
# Select the column names from the field description sheet
print("Creating template")
sel.template.desc <-
saptemplate[saptemplate$`Sheet Name` == snames[i], ]
print("Creating column names")
sel.template.desc.colnames <- sel.template.desc$Header
# Create a list by adding values from corresponding legacy data
temp <- NULL
print("adding values to template ")
for (j in seq_along(sel.template.desc.colnames)) {
temp[j] <-ifelse(sel.template.desc$oldkey[j]=="NA" | is.na(sel.template.desc$oldkey[j]),
NA,as.vector(old.copy[, sel.template.desc$oldkey[j]])
)
}
# Rename the columns according to field description
print("renaming template ")
names(temp) <- sel.template.desc.colnames
# Create data frame from the list
df <- as.data.frame(temp)
print("Converted to data frame")
# Error summary file
Expected<-nrow(df)
#Select essential rows
print("Identifying essential rows")
sel.template.desc |>
filter(Mandatory == "Yes") |>
pull(Header) -> essential.columns
error.mandatory <- NULL
error.df<-data.frame(Country=NULL, Name=NULL, Rows=NULL, Expected=NULL)
# Operate on essential columns including creation of error file
for (k in seq_along(essential.columns)) {
if(essential.columns[k]=="Currency"){
print("Found Currency. Adding 0.")
#stop()
df$International_Version<-"CHF"
}
print("Creating and writing data with missing mandatory values")
assign(
paste0(
"error_mandatory_",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
essential.columns[k]
),
df[is.na(df[, essential.columns[k]]), ]
)
# TO be saved in error files
if(nrow(df[is.na(df[, essential.columns[k]]), ])>0){
write.csv(
df[is.na(df[, essential.columns[k]]), ],
paste0(
"./projects/errors/mandatory/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
essential.columns[k],
"_error_mandatory.csv"
), row.names = F, na=""
)
}
# Error summary file
Country<-substr(names(old_files[h]), 1, 2)
Name<-snames[i]
err.type<-paste0("Missing ",essential.columns[k])
err.count<-nrow(df[is.na(df[, essential.columns[k]]), ])
print("Removing rows with empty essetial columns")
df <- df[!is.na(df[, essential.columns[k]]), ]
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
print("Identifying columns associated with codelists")
# List of columns that have a codelist
codelistcols <- sel.template.desc |>
filter(!is.na(`CodeList File Path`)) |> pull(Header)
for (k in seq_along(codelistcols)) {
if(codelistcols[k]=="Currency"){
print("Found Currency. Adding 0.")
df$International_Version<-"CHF"
}
print(paste0("Identifying errors ",codelistcols[k]))
def.rows <-
which(!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA))
def.n<- df[def.rows, 1]
def.rows.val <-
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]]
def <- data.frame(def.rows, def.n,def.rows.val)
if(nrow(def>0)){
assign(paste0(
"error_codematch_",
substr(names(old_files[1]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k]
),
def) # TO be saved
write.csv(
def,
paste0(
"./projects/errors/codelist/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k],
"_error_codematch_.csv"
), row.names = F, na=""
)
}
err.type<-paste0("Codelist Mismatch ", codelistcols[k]) #Error cal
err.count<-nrow(def) #Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
print(paste0("Removing errors ",codelistcols[k]))
# Removes any mismatch
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]] <-
NA
# Matches each column with the corresponding code list and returns the value
df[, codelistcols[k]] <-
pull(codelist_files[codelistcols[k]][[1]], 2)[match(pull(df, codelistcols[k]),
pull(codelist_files[codelistcols[k]][[1]], Description))]
}
max.length <- as.numeric(sel.template.desc$`Max Length`)
dtype <- sel.template.desc$`Data Type`
rowval <- NULL
ival <- NULL
rval <- NULL
lenght.issue.df <- NULL
# Changing the data class
for (k in 1:ncol(df)) {
if (dtype[k] == "String") {
df[, k] <- as.character(pull(df, k))
}
if (dtype[k] == "Boolean") {
df[, k] <- as.logical(pull(df, k))
}
if (dtype[k] == "DateTime") {
df[, k] <- lubridate::ymd_hms(pull(df, k))
}
if (dtype[k] == "Time") {
df[, k] <- lubridate::hms(pull(df, k))
} # This list will increase and also change based on input date and time formats
}
print("Rectifying streetname")
# Street and House Number
if (any(colnames(df) == "Street")) {
df$Streetname<-NA
df$HouseNumber<-NA
extract(df,
"Street",
c("Streetname", "HouseNumber"),
"(\\D+)(\\d.*)")
df <- df |>
select(-c("Street", "House_Number")) |>
rename(Street = Streetname, House_Number = HouseNumber) |>
select(sel.template.desc.colnames)
}
# Length Rectification
colclasses <- lapply(df, class)
print("Rectifying Length")
for (k in 1:ncol(df)) {
if (colclasses[[k]] == "character") {
print("found character column ")
rowval <- pull(df, 1)
ival <- ifelse(nchar(pull(df, k))== 0 | is.na(nchar(pull(df, k))),1,nchar(pull(df, k)))
rval <- max.length[k]
# rectifying data length
df[, k] <-
ifelse(nchar(pull(df, k)) > max.length[k],
substring(pull(df, k), 1, max.length[k]),
pull(df, k))
}
lenght.issue.df <-
rbind(lenght.issue.df, data.frame(rowval, ival, rval))
err.type<- paste0("Length error ", colnames(df)[k]) # Error cal
err.count<- sum(ival>rval, na.rm = T) # Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
lenght.issue.df <- dplyr::filter(lenght.issue.df,ival>rval)
if(nrow(lenght.issue.df)>0){
write.csv(lenght.issue.df,
paste0(
"./projects/errors/length/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_length_error.csv"
), row.names = F, na="")
}
assign(snames[i], df)
write.csv(df,paste0("./projects/output/", substr(names(old_files[h]), 1, 2), "_", snames[i],".csv"), row.names = F, na="")
if(nrow(error.df)>0){
write.csv(error.df, paste0("./projects/summary/",substr(names(old_files[h]), 1, 2), "_", snames[i],"_error",".csv"), row.names = F, na="") # Error write
}
err.summ<-rbind(err.summ,data.frame(Country=Country, Name=Name, Expected=Expected, Actual=nrow(df))) #Error Cal
}
write.csv(err.summ,
paste0("./projects/summary/" ,substr(names(old_files[h]), 1, 2), "_", snames[i],"_sumerror",".csv"), row.names = F, na="") # Error Write
}
end<-Sys.time()
end-strt
```
*The code failed because Department Column appears several times in the data and while importing R renamed them to Department..xx).*
*Manually verify if these are the required templates*

398
Report.Rmd

@ -0,0 +1,398 @@
---
title: "Report"
author: "Data Science Team, LaNubia"
date: "1/11/2022"
output:
html_document:
theme: lumen
highlight: tango
self_contained: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, error=TRUE, message=FALSE, warning=FALSE)
library(readxl)
library(DT)
library(tidyr)
library(dplyr)
rxl<- function(path,...){
tryCatch(read_excel(path,...), error= function(c){
c$message<-"No Data"
print("No Data")
stop(c)
})
}
ltodf<- function(path,...){
tryCatch(rbind.data.frame(path,...), error= function(c){
c$message<-"No Data"
print("No Data")
stop(c)
})
}
```
## Status Report
### Input Available
```{r echo=FALSE, message=FALSE, warning=FALSE}
contactinputpath<-list.files("./contacts/raw-data", pattern="*.xlsx", full.names = T)
accountinputpath<-list.files("./accounts/raw-data", pattern="*.xls", full.names = T)
projectinputpath<-list.files("./projects/raw-data", pattern="*.xls", full.names = T)
supportinputpath<-list.files("./support/raw-data", pattern="*.xls", full.names = T)
conta<-lapply(contactinputpath, read_excel)
names(conta)<-gsub("./contacts/raw-data/","",contactinputpath)
c<-lapply(conta, nrow)
Input_data<-"Contact"
#Country<-gsub(".xlsx","",names(conta))
Observations<-c
temp<-data.frame(Input_data,Observations) |>
pivot_longer(cols = (-1), names_to = "Country", values_to = "Observations") |>
mutate(Country=gsub(".xlsx","",Country))
input.summary<-temp
acco<-lapply(accountinputpath, read_excel)
names(acco)<-gsub("./accounts/raw-data/","",accountinputpath)
a<-lapply(acco, nrow)
Input_data<-"Accounts"
#Country<-gsub(".xlsx","",names(conta))
Observations<-a
temp<-data.frame(Input_data,Observations) |>
pivot_longer(cols = (-1), names_to = "Country", values_to = "Observations") |>
mutate(Country=gsub(".xls","",Country))
input.summary<-rbind(input.summary,temp)
proja<-lapply(projectinputpath, read_excel)
names(proja)<-gsub("./projects/raw-data/","",projectinputpath)
p<-lapply(proja, nrow)
Input_data<-"Projects"
#Country<-gsub(".xlsx","",names(conta))
Observations<-p
temp<-data.frame(Input_data,Observations) |>
pivot_longer(cols = (-1), names_to = "Country", values_to = "Observations") |>
mutate(Country=gsub(".xls","",Country))
input.summary<-rbind(input.summary,temp)
suppo<-lapply(supportinputpath, read_excel)
names(suppo)<-gsub("./support/raw-data/","",supportinputpath)
s<-lapply(suppo, nrow)
Input_data<-"Support"
#Country<-gsub(".xlsx","",names(conta))
Observations<-s
temp<-data.frame(Input_data,Observations) |>
pivot_longer(cols = (-1), names_to = "Country", values_to = "Observations") |>
mutate(Country=gsub(".xls","",Country))
input.summary<-rbind(input.summary,temp)
datatable(input.summary, extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
Simplified view
```{r echo=FALSE}
input.summary |>
pivot_wider(names_from = Country, values_from = Observations) |> datatable(extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
### Contacts
#### Template
SAP templates available:
```{r echo=FALSE}
datatable(data.frame(Templates=unique(rxl("./contacts/template.xlsx", sheet = "Field_Definitions")[,1])), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Summary of Errors
```{r echo=FALSE, message=FALSE, warning=FALSE}
sumerrfilepath<-list.files("./contacts/summary", pattern="*sumerror.csv", full.names = T)
errfilepath<-list.files("./contacts/summary", pattern="*_error.csv", full.names = T)
sumerrfiles<-lapply(sumerrfilepath, read.csv)
datatable(do.call(ltodf, sumerrfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Error by template
```{r echo=FALSE, message=FALSE, warning=FALSE}
errfiles<-lapply(errfilepath, read.csv)
datatable(do.call(ltodf, errfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
### Accounts
#### Template
SAP templates available:
```{r echo=FALSE}
datatable(data.frame(Templates=unique(rxl("./accounts/template.xlsx", sheet = "Field_Definitions")[,1])), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Summary of Errors
```{r echo=FALSE, message=FALSE, warning=FALSE}
sumerrfilepath<-list.files("./accounts/summary", pattern="*sumerror.csv", full.names = T)
errfilepath<-list.files("./accounts/summary", pattern="*_error.csv", full.names = T)
sumerrfiles<-lapply(sumerrfilepath, read.csv)
datatable(do.call(ltodf, sumerrfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Error by template
```{r echo=FALSE, message=FALSE, warning=FALSE}
errfiles<-lapply(errfilepath, read.csv)
datatable(do.call(ltodf, errfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
### Projects
#### Template
SAP templates available:
```{r echo=FALSE}
datatable(data.frame(Templates=unique(rxl("./projects/template.xlsx", sheet = "Field_Definitions")[,1])), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Summary of Errors
```{r echo=FALSE, message=FALSE, warning=FALSE}
sumerrfilepath<-list.files("./projects/summary", pattern="*sumerror.csv", full.names = T)
errfilepath<-list.files("./projects/summary", pattern="*_error.csv", full.names = T)
sumerrfiles<-lapply(sumerrfilepath, read.csv)
datatable(do.call(ltodf, sumerrfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Error by template
```{r echo=FALSE, message=FALSE, warning=FALSE}
errfiles<-lapply(errfilepath, read.csv)
datatable(do.call(ltodf, errfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
### Support
#### Template
SAP templates available:
```{r echo=FALSE}
datatable(data.frame(Templates=unique(rxl("./support/template.xlsx", sheet = "Field_Definitions")[,1])), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Summary of Errors
```{r echo=FALSE, message=FALSE, warning=FALSE}
sumerrfilepath<-list.files("./support/summary", pattern="*sumerror.csv", full.names = T)
errfilepath<-list.files("./support/summary", pattern="*_error.csv", full.names = T)
sumerrfiles<-lapply(sumerrfilepath, read.csv)
datatable(do.call(ltodf, sumerrfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```
#### Error by template
```{r echo=FALSE, message=FALSE, warning=FALSE}
errfiles<-lapply(errfilepath, read.csv)
datatable(do.call(ltodf, errfiles), extensions = "Buttons",
options = list(paging = TRUE,
scrollX=TRUE,
searching = TRUE,
ordering = TRUE,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
pageLength=5,
lengthMenu=c(3,5,10) ))
```

313
Report.html

File diff suppressed because one or more lines are too long

BIN
Service Request Mapping 20210715.xlsx

Binary file not shown.

374
Support.Rmd

@ -0,0 +1,374 @@
---
title: "Support"
author: "Scary Scarecrow"
date: "1/12/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(dplyr)
library(lubridate)
library(DT)
library(tidyr)
mutlstxlrdr<-function(){
for( i in seq_along(sheet.na)){
colnames<-unique(saptemplate[saptemplate$`Sheet Name`==snames[i],]$Header)
df<-read.table("", col.names = colnames)
assign(snames[i], df)
}
}
```
## Data transformation workflow
Following is the proposed preliminary workflow for the data transformation project.
>All file of a segment (support/accounts etc..) should be inside the relevant folder. Each folder should have one folder for all codelist files. All legacy data (one file for each country) should be inside the raw-data folder, named after each country. Another file having field definitions including name of the matching column from the legacy file should also be there.
>*Make sure that there are no hidden files inside the directory.*
### Code Lists
```{r Create List of Files, echo=TRUE, message=FALSE, warning=FALSE}
filenames <- list.files("./support/CodeList", pattern="*.xlsx", full.names = T) # We can avoid creating a separate directory for code list. But organizing may be difficult. However, this can be explored further if we want transform all the data in one go i.e. not by functions (support, accounts etc.).
# File paths
print(filenames)
```
Check manually if the above list includes all the codelist files
If correct, then read the files.
```{r codelistreader, echo=TRUE, message=FALSE, warning=FALSE}
sheet_names<-lapply(filenames, excel_sheets) # Creates a list of the sheet names
codelist_files<-NULL
for(i in seq_along(filenames)){
a<-lapply(excel_sheets(filenames[[i]]), read_excel, path = filenames[[i]], col_types = "text") # Reads the sheets of the excel files
names(a)<-c(sheet_names[[i]]) # Renames them according to the sheet names extracted above
codelist_files<-c(codelist_files,a)
}
# Names of the files imported
names(codelist_files)
#codelist_files<-unique(codelist_files)
codelist_files$Academic_Title
```
### Templates
Let us now extract the data. Below we are reading only one file having all data related to `support` from the legacy system.
```{r readlegacyfilepath, echo=TRUE, message=FALSE, warning=FALSE}
oldfilepath<-list.files("./support/raw-data", pattern="*.xls", full.names = T) # Change the path, check pattern
print(oldfilepath)
```
Check it the list matches the actual files, manually.
```{r readlegacyfiles, echo=TRUE}
old_files<-NULL
#read_excel(path = oldfilepath[[i]], sheet = 1)
for(i in seq_along(oldfilepath)){
old_files[[i]]<-read_excel(path = oldfilepath[[i]], sheet = 1)
}
names(old_files)<-gsub("./support/raw-data/","",oldfilepath)
```
*Some errors in the legacy file noticed. Columns with similar or same name exists.*
```{r readSAPtemplate, echo=TRUE, message=FALSE, warning=FALSE}
saptemplate<-read_excel("./support/template.xlsx", sheet = "Field_Definitions")
# First few rows of the imported data
head(saptemplate)
```
*Please note that the format of the tables (sheet) has been slightly changed. Earlier the corresponding sheet name was mentioned in a row before the actual table. Now, all the rows mention the corresponding sheet name. This was done manually for convenience of data extraction*
```{r createmptySAPfiles, echo=TRUE, message=FALSE, warning=FALSE}
#orilo<-"en_US.UTF-8"
#Sys.setlocale(locale="en_US.UTF-8")
strt<-Sys.time()
snames <- unique(saptemplate$`Sheet Name`)
for (h in seq_along(old_files)) {
# Copy original data
old.copy <- old_files[[h]]
print(paste0(names(old_files[h])," imported"))
err.summ<-data.frame(Country=NULL, Name=NULL, Expected=NULL, Actual=NULL) #Error Cal
# Creates data frame for each sheet in snames
for (i in seq_along(snames)) {
print(paste0("Processing ..",snames[i]))
# Select the column names from the field description sheet
print("Creating template")
sel.template.desc <-
saptemplate[saptemplate$`Sheet Name` == snames[i], ]
print("Creating column names")
sel.template.desc.colnames <- sel.template.desc$Header
# Create a list by adding values from corresponding legacy data
temp <- NULL
print("adding values to template ")
for (j in seq_along(sel.template.desc.colnames)) {
temp[j] <-ifelse(sel.template.desc$oldkey[j]=="NA" | is.na(sel.template.desc$oldkey[j]),
NA,as.vector(old.copy[, sel.template.desc$oldkey[j]])
)
}
# Rename the columns according to field description
print("renaming template ")
names(temp) <- sel.template.desc.colnames
# Create data frame from the list
df <- as.data.frame(temp)
print("Converted to data frame")
# Error summary file
Expected<-nrow(df)
#Select essential rows
print("Identifying essential rows")
sel.template.desc |>
filter(Mandatory == "Yes") |>
pull(Header) -> essential.columns
error.mandatory <- NULL
error.df<-data.frame(Country=NULL, Name=NULL, Rows=NULL, Expected=NULL)
# Operate on essential columns including creation of error file
for (k in seq_along(essential.columns)) {
if(essential.columns[k]=="International_Version"){
print("Found International Version. Adding 0.")
#stop()
df$International_Version<-"0"
}
print("Creating and writing data with missing mandatory values")
assign(
paste0(
"error_mandatory_",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
essential.columns[k]
),
df[is.na(df[, essential.columns[k]]), ]
)
# TO be saved in error files
if(nrow(df[is.na(df[, essential.columns[k]]), ])>0){
write.csv(
df[is.na(df[, essential.columns[k]]), ],
paste0(
"./support/errors/mandatory/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
essential.columns[k],
"_error_mandatory.csv"
), row.names = F, na=""
)
}
# Error summary file
Country<-substr(names(old_files[h]), 1, 2)
Name<-snames[i]
err.type<-paste0("Missing ",essential.columns[k])
err.count<-nrow(df[is.na(df[, essential.columns[k]]), ])
print("Removing rows with empty essetial columns")
df <- df[!is.na(df[, essential.columns[k]]), ]
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
print("Identifying columns associated with codelists")
# List of columns that have a codelist
codelistcols <- sel.template.desc |>
filter(!is.na(`CodeList File Path`)) |> pull(Header)
for (k in seq_along(codelistcols)) {
if(codelistcols[k]=="International_Version"){
print("Found International Version. Adding 0.")
df$International_Version<-"0"
}
print(paste0("Identifying errors ",codelistcols[k]))
def.rows <-
which(!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA))
def.n<- df[def.rows, 1]
def.rows.val <-
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]]
def <- data.frame(def.rows, def.n,def.rows.val)
if(nrow(def>0)){
assign(paste0(
"error_codematch_",
substr(names(old_files[1]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k]
),
def) # TO be saved
write.csv(
def,
paste0(
"./support/errors/codelist/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_",
codelistcols[k],
"_error_codematch_.csv"
), row.names = F, na=""
)
}
err.type<-paste0("Codelist Mismatch ", codelistcols[k]) #Error cal
err.count<-nrow(def) #Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
print(paste0("Removing errors ",codelistcols[k]))
# Removes any mismatch
df[!df[, codelistcols[k]] %in% c(pull(codelist_files[codelistcols[k]][[1]], Description), NA), codelistcols[k]] <-
NA
# Matches each column with the corresponding code list and returns the value
df[, codelistcols[k]] <-
pull(codelist_files[codelistcols[k]][[1]], 2)[match(pull(df, codelistcols[k]),
pull(codelist_files[codelistcols[k]][[1]], Description))]
}
max.length <- as.numeric(sel.template.desc$`Max Length`)
dtype <- sel.template.desc$`Data Type`
rowval <- NULL
ival <- NULL
rval <- NULL
lenght.issue.df <- NULL
# Changing the data class
for (k in 1:ncol(df)) {
if (dtype[k] == "String") {
df[, k] <- as.character(pull(df, k))
}
if (dtype[k] == "Boolean") {
df[, k] <- as.logical(pull(df, k))
}
if (dtype[k] == "DateTime") {
df[, k] <- lubridate::ymd_hms(pull(df, k))
}
if (dtype[k] == "Time") {
df[, k] <- lubridate::hms(pull(df, k))
} # This list will increase and also change based on input date and time formats
}
print("Rectifying streetname")
# Street and House Number
if (any(colnames(df) == "Street")) {
df$Streetname<-NA
df$HouseNumber<-NA
extract(df,
"Street",
c("Streetname", "HouseNumber"),
"(\\D+)(\\d.*)")
df <- df |>
select(-c("Street", "House_Number")) |>
rename(Street = Streetname, House_Number = HouseNumber) |>
select(sel.template.desc.colnames)
}
# Length Rectification
colclasses <- lapply(df, class)
print("Rectifying Length")
for (k in 1:ncol(df)) {
if (colclasses[[k]] == "character") {
print("found character column ")
rowval <- pull(df, 1)
ival <- ifelse(nchar(pull(df, k))== 0 | is.na(nchar(pull(df, k))),1,nchar(pull(df, k)))
rval <- max.length[k]
# rectifying data length
df[, k] <-
ifelse(nchar(pull(df, k)) > max.length[k],
substring(pull(df, k), 1, max.length[k]),
pull(df, k))
}
lenght.issue.df <-
rbind(lenght.issue.df, data.frame(rowval, ival, rval))
err.type<- paste0("Length error ", colnames(df)[k]) # Error cal
err.count<- sum(ival>rval, na.rm = T) # Error cal
if(err.count>0){
error.df<-rbind(error.df,data.frame(Country=Country, Name=Name, err.type=err.type, err.count=err.count)) #Error cal
}
}
lenght.issue.df <- dplyr::filter(lenght.issue.df,ival>rval)
if(nrow(lenght.issue.df)>0){
write.csv(lenght.issue.df,
paste0(
"./support/errors/length/",
substr(names(old_files[h]), 1, 2),
"_",
snames[i],
"_length_error.csv"
), row.names = F, na="")
}
assign(snames[i], df)
write.csv(df,paste0("./support/output/", substr(names(old_files[h]), 1, 2), "_", snames[i],".csv"), row.names = F, na="")
if(nrow(error.df)>0){
write.csv(error.df, paste0("./support/summary/",substr(names(old_files[h]), 1, 2), "_", snames[i],"_error",".csv"), row.names = F, na="") # Error write
}
err.summ<-rbind(err.summ,data.frame(Country=Country, Name=Name, Expected=Expected, Actual=nrow(df))) #Error Cal
}
write.csv(err.summ,
paste0("./support/summary/" ,substr(names(old_files[h]), 1, 2), "_", snames[i],"_sumerror",".csv"), row.names = F, na="") # Error Write
}
end<-Sys.time()
end-strt
```
*The code failed because Department Column appears several times in the data and while importing R renamed them to Department..xx).*
*Manually verify if these are the required templates*

BIN
Technical Support/CodeList/CodeList_ServiceRequestSkillsCollectionCollection.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_BTD_Reference.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Item.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Item_Notes.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Location_Address.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Notes.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Other_Party.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Party.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/CodeList_Service_Request_Solution_Proposal.xlsx

Binary file not shown.

BIN
Technical Support/CodeList/~$CodeList_Service_Request.xlsx

Binary file not shown.

BIN
Technical Support/Product Classification_TecSup Origin_Type of TecSup.xlsx

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View CN.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View CZ.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View DE.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View ES.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View IT.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View NL.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View NO.xls

Binary file not shown.

BIN
Technical Support/Technical Support Advanced Find View PL.xls

Binary file not shown.

3
Tranlation from MS CRM/CrmTranslations.xml

File diff suppressed because one or more lines are too long

BIN
Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214.zip

Binary file not shown.

3
Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214/CrmTranslations.xml

File diff suppressed because one or more lines are too long

1
Tranlation from MS CRM/CrmTranslations_OrbCSSCoreExtended_1_0_20151214/[Content_Types].xml

@ -0,0 +1 @@
<?xml version="1.0" encoding="utf-8"?><Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="xml" ContentType="application/octet-stream" /></Types>

1
Tranlation from MS CRM/[Content_Types].xml

@ -0,0 +1 @@
<?xml version="1.0" encoding="utf-8"?><Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="xml" ContentType="application/octet-stream" /></Types>

BIN
accounts/CodeList/CodeList_Account.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Addresses.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Contact_Persons.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Identification.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_International_Version.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Sales_Data.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Tax_Numbers.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Team.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Visiting_Hours.xlsx

Binary file not shown.

BIN
accounts/CodeList/CodeList_Account_Visits_Details.xlsx

Binary file not shown.

BIN
accounts/Detailled Field Mapping Account 20210701.xlsx

Binary file not shown.

BIN
accounts/Line of Business translation -TEMPLATE-V1.xlsx

Binary file not shown.

Some files were not shown because too many files changed in this diff

Loading…
Cancel
Save