Data Cleansing Instructions

Guidelines for County participation in data cleansing activities – there is a checklist at the end of this page.

1. Receive Communication

To ensure that you receive emails regarding data quality and cleansing activities, ensure that when you create a profile on the portal you select the option to become a data quality analyst (DQA); there is a how-to on the portal home page. The DQA is expected to be involved in de-dup processes and other data-centric activities. Each County should designate at least one DQA to be on data quality related emails and discussions. See current DQAs here.

 

2. What Cleansing is Available?

The types of data cleansing can broken into three groups. These are:

  1. Fully Automated: These require no County intervention and will occur periodically
     
  2. Manually-Directed (County validated): These require a County review and approval prior to the changes being automatically made in CWS/CMS
     
  3. Manual: These require County staff to go into CWS/CMS and perform manual cleansing (e.g. merging) using a provided report listing

 

3. Cleansing Details

Descriptions of the data cleansing processes available.

3.1 Fully Automated:

 

Some processes will need to be opted in to participate. However, the signup is only once for each process, and your county will continue to participate in future fully automated cleanup process runs. Currently these are:

 

Client Merge

Information    

 

Substitute Care Provider Merge

Information

 

SCR8750 Closed Adoption Cases

Information

 

SCR8752 Service Provider Address Merge                        

Information

 

SCR8844 Educ Enroll. Closure  Information  
     


Sign up for CWS/CMS DQ Cleansing Processes here
 

3.2 County Validation:

These Manually-Directed processes operate automatically to perform cleansing BUT will only do so once a report has been reviewed and flagged to proceed for each run. All processes are considered high priority unless otherwise stated.  The current processes are: 

Placement Home Merge

Information 

No sign-up needed. Report sent semi-annually

Service Provider Merge

Information

No sign-up needed. Report sent semi-annually

Attorney Merge

Information

No sign-up needed. Report sent semi-annually

SCR8653 Education Provider Merge

Information

 

LIS & CWS FFACRFH & FFACH Merge                   

Information

      

SCR8794 Service Provider Deduplication

Information 

 

Client Email Correction for CARES Information
 
Unknown Client Name Standardization Information
 
Client Prefix / Suffix Cleanse and Standardization Information  
SCP Email Correction for CARES Information
 

SCR8912 Substitute Care Provider Deduplication

Information 

 

SCR8936 Multi-Notebook Email Correction for Cares Information  
SCR8776 SSN Cleanse for CARES Information  
SCR8935 Reporter Badge Number Cleanse for Cares Information  
SCR8955 Other Adult Name Update Information  
SCR8840 Placement Home Name Stnd (low priority/ optional) Information  



































Sign up for CWS/CMS DQ Cleansing Processes here

To see a list of what the processes automated & manually-directed cleansing processes that counties have opted to participate in go to Links > Auto-cleanup Counties

To see participation in the Placement Home / SCP / Attorney semi-annual merge processes go to Links >  Semi-annual Counties

 

3.3 Manual Cleansing

Due to the complexity of data in CWS/CMS, some cleanup will require to be performed manually within the CWS/CMS application. Although this is the least favorable approach, it may still be necessary to ensure accurate results. Currently the manual cleansing that is being promoted (as high priority) is:

 

Client Merge Exceptions                   

Merges that were not possible in the automated process are sent to an exception report. Information on work-around actions for these can be found in the document "Merge Client Exceptions_Best_Plan.pdf" (Links > Documents > Client)

 

Fuzzy Reports. Data Quality has produced special Excel reports that list potential duplicates using fuzzy matching criteria (an “almost the same” matching technique). These reports are produced periodically and allow Counties to perform manual merges in CWS/CMS. A communication is sent out when they become available with a link to the sign-up page to request the reports, or they may be requested here. Currently the reports are for:

 

Client

Fields in the report

Placement Homes

Fields in the report

Substitute Care Provider fuzzy report has been replaced by SCR8912 SCP Deduplication. See section 3.2 to sign-up.                            

 

 

4. What Other Information is Available?

 

Data Quality Issues

Add new issues or review existing ones at (Links > Issues List)

Data Quality Workgroup                                       

Collaborative group interested in data quality at the County level. Meets monthly (4th Thursday) to discuss strategies related to data quality issues and SCRs. To request attendance contact your County SSC. To view previous topics visit (Links > Documents > Workgroup)

Data Quality Workshop

Hands-on safe-space workshop to learn and collaborate on data quality tasks and ideas. Typically held after CTW has completed. Details are sent out to SPOCs and DQAs around six weeks before scheduled sessions. To view previous material visit (Links > Documents > Workshop)

 

5. Helpful Links

List of the automated & manually-directed cleansing processes (Links > Auto-cleanup Frequency)

List of the Counties signed up for fully automated cleansing (Links > Auto-cleanup Counties)

Schedule of the semi-annual merge processes (Links > Documents > Merges)

Information on the Merges (Links > Documents > Merges)

List of County DQAs (Links > County Data Quality Analyst Contacts)

New / Merged / Duplicate metrics (Links > Documents > Reporting > Cleansing Progress)

Request Fuzzy Reports here

 

6. Checklist

  

Nbr

Page Ref

Step

Frequency

1

1

County has at least 1 DQA

Sign-up once.  

2

3.1

Signed up for fully automated Client merge

Sign-up once.  Process runs every two months.

3

3.1

Signed up for fully automated SCP merge

Sign-up once.  Process runs every two months.

4

3.1

Signed up for fully automated SCR8750 Closed Adoption Cases

Sign-up once.  Process runs quarterly.

5

3.1

Signed up for fully automated SCR8752 Service Provider Address Merge

Sign-up once.  Process runs weekly.
3.1 Signed-up for automated SCR8844 Educ Enroll. Closure  Sign-up once.  Process runs quarterly.

7

3.2

Updated ‘proceed’ flag for Placement Home Merge

Semi-annual

8

3.2

Updated ‘proceed’ flag for Service Provider Merge

Semi-annual

9

3.2

Updated ‘proceed’ flag for Attorney Merge

Semi-annual

10

3.2

Signed up for and taken part in SCR8753 Education Provider Merge

Sign-up once. Process runs weekly

11

3.2

Signed up and taken part in LIS & CWS FFACRFH & FFACH Merge

Sign-up once. Process runs on-demand

12

3.2

Signed up and taken part in SCR8794 Service Provider Deduplication

Sign-up once. Process runs weekly

13

3.2 Signed up and taken part in Client Email Correction Sign-up once. Process runs bi-monthly

14

3.2 Signed up and taken part in Unknown Client Name Standardization Sign-up once. Process runs bi-monthly
15 3.2 Signed up and taken part in Client Prefix / Suffix Cleanse and Standardization Sign-up once. Process runs bi-monthly

16

3.2 Signed up and taken part in SCP Email Correction Sign-up once. Process runs bi-monthly

17

3.2

Signed up and taken part in SCR8912 SCP Deduplication

Sign-up once. Process runs weekly
18 3.2 Signed up and taken part in SCR8936 Multi-Notebook Email Correction for CARES Sign-up once. Process runs bi-monthly
19 3.2 SCR8776 SSN Cleanse for CARES Sign-up once. Process runs bi-monthly
20 3.2 SCR8935 Reporter Badge Number Cleanse for CARES Sign-up once. Process runs bi-monthly
21 3.2 SCR8955 Other Adult Name Update Sign-up once. Process runs bi-monthly
22 3.2 SCR8840 Placement Home Name Stnd (low priority/ optional) Sign-up once. Process runs on-demand
23

3.3

(Optional) Worked on Client exceptions after automatic merge

Bi-Monthly

24

3.3

(Optional) Request Fuzzy Reports and perform manual Client Merge

Sign-up each time. Process runs on-demand

25

3.3

(Optional) Request Fuzzy Reports and perform manual Placement Home Merge     

Sign-up each time. Process runs on-demand
 

 

 

 
       

Download Excel version here

 

 

7. Frequently Asked Questions (FAQ)

How do I get the Data Quality Reports (for cleansing / fuzzy)?

  • Accessible at: https://safe.cdt.ca.gov/ using your existing County account of "osi-cms-countyname-user" e.g. osi-cms-sacramento-user
  • For County account provisioning, contact IBM Boulder helpdesk at 800-428-8268 stating that this is for the Data Quality reports on SAFE
  • Consult your County IT department for access

 

Where will I find the semi-annual reports?

  • Your County Server mirror of CWS/CMS Distributed File Server
  • Historically referred to as the "V" drive

 

What if I have questions?

  • Contact your County assigned SSC