Data Cleansing Instructions
Guidelines for County participation in data cleansing activities – there is a checklist at the end of this page.
1. Receive Communication
To ensure that you receive emails regarding data quality and cleansing activities, ensure that when you create a profile on the portal you select the option to become a data quality analyst (DQA); there is a how-to on the portal home page. The DQA is expected to be involved in de-dup processes and other data-centric activities. Each County should designate at least one DQA to be on data quality related emails and discussions. See current DQAs here.
2. What Cleansing is Available?
The types of data cleansing can broken into three groups. These are:
- Fully Automated: These require no County intervention and will occur periodically
- Manually-Directed (County validated): These require a County review and approval prior to the changes being automatically made in CWS/CMS
- Manual: These require County staff to go into CWS/CMS and perform manual cleansing (e.g. merging) using a provided report listing
3. Cleansing Details
Descriptions of the data cleansing processes available.
3.1 Fully Automated:
Some processes will need to be opted in to participate. However, the signup is only once for each process, and your county will continue to participate in future fully automated cleanup process runs. Currently these are:
Client Merge |
|
|
Substitute Care Provider Merge |
|
|
SCR8750 Closed Adoption Cases |
|
|
SCR8752 Service Provider Address Merge |
|
|
SCR8844 Educ Enroll. Closure | Information | |
Sign up for CWS/CMS DQ Cleansing Processes here
3.2 County Validation:
These Manually-Directed processes operate automatically to perform cleansing BUT will only do so once a report has been reviewed and flagged to proceed for each run. All processes are considered high priority unless otherwise stated. The current processes are:
Placement Home Merge |
No sign-up needed. Report sent semi-annually |
|
Service Provider Merge |
No sign-up needed. Report sent semi-annually |
|
Attorney Merge |
No sign-up needed. Report sent semi-annually |
|
SCR8653 Education Provider Merge |
|
|
LIS & CWS FFACRFH & FFACH Merge |
|
|
SCR8794 Service Provider Deduplication |
|
|
Client Email Correction for CARES | Information | |
Unknown Client Name Standardization | Information | |
Client Prefix / Suffix Cleanse and Standardization | Information | |
SCP Email Correction for CARES | Information | |
SCR8912 Substitute Care Provider Deduplication |
|
|
SCR8936 Multi-Notebook Email Correction for Cares | Information | |
SCR8776 SSN Cleanse for CARES | Information | |
SCR8935 Reporter Badge Number Cleanse for Cares | Information | |
SCR8955 Other Adult Name Update | Information | |
SCR8840 Placement Home Name Stnd (low priority/ optional) | Information |
Sign up for CWS/CMS DQ Cleansing Processes here
To see a list of what the processes automated & manually-directed cleansing processes that counties have opted to participate in go to Links > Auto-cleanup Counties
To see participation in the Placement Home / SCP / Attorney semi-annual merge processes go to Links > Semi-annual Counties
3.3 Manual Cleansing
Due to the complexity of data in CWS/CMS, some cleanup will require to be performed manually within the CWS/CMS application. Although this is the least favorable approach, it may still be necessary to ensure accurate results. Currently the manual cleansing that is being promoted (as high priority) is:
Client Merge Exceptions |
Merges that were not possible in the automated process are sent to an exception report. Information on work-around actions for these can be found in the document "Merge Client Exceptions_Best_Plan.pdf" (Links > Documents > Client) |
Fuzzy Reports. Data Quality has produced special Excel reports that list potential duplicates using fuzzy matching criteria (an “almost the same” matching technique). These reports are produced periodically and allow Counties to perform manual merges in CWS/CMS. A communication is sent out when they become available with a link to the sign-up page to request the reports, or they may be requested here. Currently the reports are for:
Client |
|
Placement Homes |
|
Substitute Care Provider fuzzy report has been replaced by SCR8912 SCP Deduplication. See section 3.2 to sign-up. |
|
4. What Other Information is Available?
Data Quality Issues |
Add new issues or review existing ones at (Links > Issues List) |
Data Quality Workgroup |
Collaborative group interested in data quality at the County level. Meets monthly (4th Thursday) to discuss strategies related to data quality issues and SCRs. To request attendance contact your County SSC. To view previous topics visit (Links > Documents > Workgroup) |
Data Quality Workshop |
Hands-on safe-space workshop to learn and collaborate on data quality tasks and ideas. Typically held after CTW has completed. Details are sent out to SPOCs and DQAs around six weeks before scheduled sessions. To view previous material visit (Links > Documents > Workshop) |
5. Helpful Links
List of the automated & manually-directed cleansing processes (Links > Auto-cleanup Frequency)
List of the Counties signed up for fully automated cleansing (Links > Auto-cleanup Counties)
Schedule of the semi-annual merge processes (Links > Documents > Merges)
Information on the Merges (Links > Documents > Merges)
List of County DQAs (Links > County Data Quality Analyst Contacts)
New / Merged / Duplicate metrics (Links > Documents > Reporting > Cleansing Progress)
Request Fuzzy Reports here
6. Checklist
Nbr |
Page Ref |
Step |
Frequency |
1 |
1 |
County has at least 1 DQA |
Sign-up once. |
2 |
3.1 |
Signed up for fully automated Client merge |
Sign-up once. Process runs every two months. |
3 |
3.1 |
Signed up for fully automated SCP merge |
Sign-up once. Process runs every two months. |
4 |
3.1 |
Signed up for fully automated SCR8750 Closed Adoption Cases |
Sign-up once. Process runs quarterly. |
5 |
3.1 |
Signed up for fully automated SCR8752 Service Provider Address Merge |
Sign-up once. Process runs weekly. |
6 | 3.1 | Signed-up for automated SCR8844 Educ Enroll. Closure | Sign-up once. Process runs quarterly. |
7 |
3.2 |
Updated ‘proceed’ flag for Placement Home Merge |
Semi-annual |
8 |
3.2 |
Updated ‘proceed’ flag for Service Provider Merge |
Semi-annual |
9 |
3.2 |
Updated ‘proceed’ flag for Attorney Merge |
Semi-annual |
10 |
3.2 |
Signed up for and taken part in SCR8753 Education Provider Merge |
Sign-up once. Process runs weekly |
11 |
3.2 |
Signed up and taken part in LIS & CWS FFACRFH & FFACH Merge |
Sign-up once. Process runs on-demand |
12 |
3.2 |
Signed up and taken part in SCR8794 Service Provider Deduplication |
Sign-up once. Process runs weekly |
13 |
3.2 | Signed up and taken part in Client Email Correction | Sign-up once. Process runs bi-monthly |
14 |
3.2 | Signed up and taken part in Unknown Client Name Standardization | Sign-up once. Process runs bi-monthly |
15 | 3.2 | Signed up and taken part in Client Prefix / Suffix Cleanse and Standardization | Sign-up once. Process runs bi-monthly |
16 |
3.2 | Signed up and taken part in SCP Email Correction | Sign-up once. Process runs bi-monthly |
17 |
3.2 |
Signed up and taken part in SCR8912 SCP Deduplication |
Sign-up once. Process runs weekly |
18 | 3.2 | Signed up and taken part in SCR8936 Multi-Notebook Email Correction for CARES | Sign-up once. Process runs bi-monthly |
19 | 3.2 | SCR8776 SSN Cleanse for CARES | Sign-up once. Process runs bi-monthly |
20 | 3.2 | SCR8935 Reporter Badge Number Cleanse for CARES | Sign-up once. Process runs bi-monthly |
21 | 3.2 | SCR8955 Other Adult Name Update | Sign-up once. Process runs bi-monthly |
22 | 3.2 | SCR8840 Placement Home Name Stnd (low priority/ optional) | Sign-up once. Process runs on-demand |
23 |
3.3 |
(Optional) Worked on Client exceptions after automatic merge |
Bi-Monthly |
24 |
3.3 |
(Optional) Request Fuzzy Reports and perform manual Client Merge |
Sign-up each time. Process runs on-demand |
25 |
3.3 |
(Optional) Request Fuzzy Reports and perform manual Placement Home Merge |
Sign-up each time. Process runs on-demand |
|
|
||
Download Excel version here
7. Frequently Asked Questions (FAQ)
How do I get the Data Quality Reports (for cleansing / fuzzy)?
- Accessible at: https://safe.cdt.ca.gov/ using your existing County account of "osi-cms-countyname-user" e.g. osi-cms-sacramento-user
- For County account provisioning, contact IBM Boulder helpdesk at 800-428-8268 stating that this is for the Data Quality reports on SAFE
- Consult your County IT department for access
Where will I find the semi-annual reports?
- Your County Server mirror of CWS/CMS Distributed File Server
- Historically referred to as the "V" drive
What if I have questions?
- Contact your County assigned SSC