On March 3, 2015, just as the Clinton email story was breaking, IGI and Charter Supporter, OpenText, held a previously scheduled webinar, What You Can Learn from the U.S. Federal Government’s “Capstone” Strategy to Solve Your Email Governance Headaches. Coverage and commentary about Clinton raised the importance of managing emails better—the very issues that the webinar was slated to discuss.
Participants had an opportunity to ask questions and have questions answered during the webinar, but several great questions came in that were not able to be addressed during the live discussion.
Panelists Greg Clark (Director, Program Management-ECM, OpenText), Mark Mandel (Records Management Solutions Architect, OpenText), and Carol Brock (Certified Records Manager & VP, IG IQ Business Group), took the time to respond to those questions, providing great, practical insight to audience questions.*
Q: What tool is the DOI using for auto-classification?
OpenText Auto-Classification is being used at DOI in conjunction with the Content Suite and Email Monitoring products.
Q: Can Ms. Brock provide more detail on what criterion, e.g., are being used to auto-classify, i.e. is it just sender/recipient criterion or is there a content based trigger for auto classification?
OpenText Auto-classification (AC) uses sample, exemplar sets of documents that represent the functions and tasks performed by the organization (e.g. HR related 7yrs, Legal 10 yrs, project related 10 yrs), rules and keywords to enhance the identification, and triggers that fire on new content being consumed. Statistical sampling is also used in the AC process. For enterprise-wide use, sample documents are gathered for each task of each function to build out the model which the content is run against.
Q: How long does it take to train the system to auto-classify content?
It’s dependent on the number of categories used, exemplar sets used, and the determination of “how good is good enough” – meaning the courts/auditors don’t expect perfection but organizational goals e.g. 80-85% should be set as targets to achieve over the first 3-6 months. The system can evolve and make further improvements over time as new content comes into the system. Records that are not classified are placed in a default category and are still retained.
Q: What is the NARA Job Number for the DOI big bucket schedules? It would be nice to see the wording in the SF115.
The NARA job numbers are:
Legal: DAA-0048-2014-0001. Mission is still being refined prior to submission. Only Admin. has been approved by NARA so far.
Q: Can you briefly explain how you implement journaling without duplicating email?
In essence journaled mail has never been distributed to users mailboxes (and DL lists, groups have yet to be expanded) so it is a single instance of the message already. Deduplication typically occurs when there is mailbox collection, PST ingestion, etc. where the mail has been distributed to users mailboxes. Journaling is the only legally defensible approach because it ensures that all email is captured and that users cannot delete email on their own.
Q: Did you consider the effectiveness of using userID functional descriptors (as opposed to content semantics) as perhaps a short-hand or interim, big-bucket strategy to link e-mails to schedules or policies based on functions?
As we mentioned, user/ group associations are dependent on a solid directory and may lead to over retention (e.g. keep everything approach). By using sample/ exemplar content, the system doesn’t rely solely on algorithms or a blackbox to determine retention policy. Exemplar content, statistical sampling, and quality assurance are highly defensible in proving how information has been managed (from ingestion to disposition).
That said, we have had customers deploy a big bucket strategy that allows end users to move business relevant email into a “managed classified” folder(s). In terms of automation, this could be used in conjunction with Auto-classification. Granularity of policy would be applied by auto-classification beyond the “bug bucket”. This approach may be acceptable in the private sector. However in government any approach that allows end users to make decisions about retention may be indefensible in a court case.
Q: Can you tell us about the cloud supplier and FISMA certification level? Does DOI have any FISMA High level information and how do you deal with it?
The cloud supplier is QTS (qtsdatacenters.com). For information on FISMA High level information, please contact John Montel (email@example.com).
Q: Are there best practices for dealing with attachments to email and obsolescence issues related to long term storage of attachments?
OpenText stores attachments separately from the native email. For eDiscovery and FOIA purposes, the system may be configured to link the attachments to the email, as in the original.
Q: As companies are developing strategies to manage e-mail, what are the basic business requirements that we should consider (e.g. apply retention; ensure that e-mail is searchable for e-discovery purposes; metadata capture)? Also, do you have any thoughts on the "3 zone" approach (zone 1 - inbox.sent, 30/60/90 day, zone 2 - intermediate, 3 year; zone 3 - long-term, 7 or 10 years) as a way to segment e-mails?
When we sit down and talk strategy with customers it comes down to a couple of different approaches and it really depends on how your organization views end users and how comfortable senior management and legal are with technology being a part of the solution.
- Organization is looking to automate and remove end users from classification of email
Email is still the most prolific source of unstructured data and it’s not going away. As organizations move the cloud (O365 and Gmail) the simplest approach for capture is journaling. Automation captures 100% of inbound, outbound and external mail (legal, unaltered copy) from the journal feed or routing rules on the mail server.
End user productivity is not impacted since it is transparent to users. Automation allows RM/ IM professionals to broaden their scope and look beyond official records and apply their Information Governance principles holistically across all content (business relevant content, records and transitory).
eDiscovery, legal hold and export is available centrally on a single repository. Questions – Legal team’s comfort level with automated approach, accepting less than 100% accuracy (courts don’t expect it), if users are used to filing email there is alteration to user behavior. Without an additional level of classification, journaling can lead to over retention and the unintended consequences of keeping too much may lead to reputation loss, brand damage, sanctions, fines, etc. that could have been avoided if policy was applied on capture.
- Organizations are looking to engage end users to be a part of the solution to encourage best practices.
3 zone approach or interactive classification is being used in the majority of our accounts historically. It requires the end user to be involved manually to file email appropriately and follow best practices. 3 zones represent official records (5% of email), business related mail (20%) and transitory email (75% of email with little or no business value).
Traditionally this approach has proven to be successful at reducing email, email storage and keeping Exchange lean and mean. However, with end user involvement you will see: inconsistency (b/c rely on end users to follow instructions), some impact on user productivity but we’ve seen reduction in email by 70%+ in some accounts due to the rolling off of transitory (non-business) mail. RM users are required to do some “mopping up” and reclassifying of email in this approach which is lower value work. The three zone approach is therefore not recommended for government.
Also, in your strategy to manage email, remember that email creates business record and records supporting the same business function should be grouped together with the other records that the function creates (i.e. case file concept). With use of the big bucket, functional retention schedule concepts, the entire case file then has only one retention period applied to it – for all of its content.
Q: How seamless is the auto-classification and archiving to the end-user i.e. fully integrated into the outlook or notes email client?
Email is captured from the journal e.g. and unaltered copy before it hits the user’s inbox. As a result auto-classification is transparent to the end user. Any actions by the user to delete or file email do not affect the legal archive at all.
Q: Do the retention periods for email the same as their paper equivalent - according to record type.
Records Management best practice is to have a “big bucket” streamlined records schedule that is media independent. Therefore records policy is applied consistently to all content, including paper, email, PDF, video, MS Office, and so on.
On April 27, 2015 at 11:00 AM ET, the IGI and OpenText will host a follow up webinar, Moving Beyond “Capstone”—Leveraging Auto-Classification Technology to Address Email Governance, in which our panelists will build upon that discussion and delve deeper into the topic of auto-classification. During this follow up webinar we will:
- Recap the Federal Government’s “Capstone” Approach.
- Explain auto-classification of data, how it works, and what technologies are available.
- Discuss how organizations can leverage advances in such technology to automate the process of email based on content as opposed to other automated approaches.
For more coverage of the Clinton email, including links to commentary by Jason R. Baron (IGI Co-Chair and former Director of Litigation at NARA), read IGI’s blogs on the topic:
BREAKING CLINTON EMAIL STORY UNDERSCORES NEED FOR BETTER IG FOR EMAILS
“BOMBSHELL REVELATIONS” ABOUT HILLARY CLINTON’S USE OF PERSONAL EMAIL ACCOUNT. WATCH TODAY’S IGI WEBINAR ON DEMAND
HILLARY & HISTORY: LESSONS TO IMPROVE OPEN GOVERNMENT UNDER THE RECORDS LAWS FROM CLINTON’S EMAIL RECORDS EPISODE
HILLARY & HISTORY: IGI PARTICIPATES IN NATIONAL PRESS CLUB NEWSMAKERS EVENT
* Note that the answers provided here were provided solely by the panelists as described above and do not necessarily represent the views or positions of the Information Governance Initiative. We work to actively encourage a variety of viewpoints on IG and bring them to our community, and we are providing this information as part of that work.