Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount. https://sharepointna.com

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action. https://techcommunity.microsoft.com/t5/Microsoft-Search-Blog/Find-what-you-want-discover-what-you-need-with-personalized/ba-p/109590

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:

Term

Meaning

Term

Meaning

Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post (https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi/) and this post (https://blog.crossjoin.co.uk/2017/12/14/removing-punctuation-from-text-with-the-text-select-m-function-in-power-bi-power-query-excel-gettransform/ ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

New SharePoint Modern Power BI Theme

If you are creating Power BI content that you are embedding into a SharePoint Modern page, you know that the default Power BI theme colors don’t match SharePoint’s colors. Not to worry, we’ve got you covered. 

Attached to this post and also cross-posted to the https://Community.PowerBI.com Theme Gallery is the Tumble Road SharePoint Modern theme. This theme uses the core Modern experience colors in SharePoint, assuring your Power BI content will look great when embedded within SharePoint.

Download the zip file here.

Chaos Management and the Cubicle Hero

When asked, “What do you want to be when you grow up?” you may have replied, “Firefighter.” If you did, I’m sure you meant one of the awesome individuals who provide medical services, rescues and ride the fire trucks. While, most of us never realized that dream, there are days at the office where you probably feel that “Fire Fighter” should be your job title.

Welcome to the wonderful world of the Cubicle Hero, where fighting fires is part of your job!

Perhaps you ask yourself at the end of each day, “How did I get here?” Many feel stuck in these roles without a way out and are puzzled as to how it happened. I talked about the True Cost of the Cubicle Hero in this previous article, so let’s look at how Cubicle Heroes form.

One reason Cubicle Heroes arise is due to a work environment that isn’t structured to respond well to chaos. If there are no processes for reacting to chaos in a controlled manner, the result is a crisis, which requires some brave person to step in to address. This person is caught in that role going forward, thus evolving into the Cubicle Hero. Chaos is ever present and needed for the organization to evolve and remain competitive. The organization is going to run out of Heroes unless a systemic way of reacting is created.

Internal efforts such as implementing a new HR system creates short-term chaos and long-term impact the organization. If your organization doesn’t have a formal project transition process to production, Cubicle Heroes usually form from the project’s team members who hold the detailed knowledge about the project’s deliverables . A problem related to the project arises. This leads to a project team member solving the issue and then becoming the Hero going forward.

Ad hoc project transformation process creates “human hard drives” out of the project team members, where they must store and retrieve organizational knowledge as needed. This restricts the ability of team members to grow their skills as letting go of that knowledge results in a loss to the organization. A formal transformation process ensures relevant information is captured so that it can be widely used within the organization, freeing the team members to move on.

External events such as a large client with a new, immediate need or a viral photo of a dress of indeterminate color are also chaos sources. Does your company treat these requests as fire drills  or do they have a way to manage them?

The best companies have a deep respect for chaos and put practices in place to manage it and to learn from it. New products and services are sometimes rooted in chaos learnings. Successful chaos management becomes a source of positive change within an organization, as it provides opportunities for people to learn new skills and encounter new situations. As discussed in the earlier article, these new skills and experiences prepare these individuals to be the Explorers that we need.

If your company grows Cubicle Heroes, then the first step in the solution is to address the underlying cultural issues. Adding tools too soon will simply result in chaos at light speed. Addressing this issue is especially problematic in organizations where management has built their careers on their firefighting abilities. Cubicle Heroes tend to prosper in environments which lack visibility into cause and effect. One of my Project Management Office  tool implementations came to a grinding halt when the sponsor, who was a master Cubicle Hero, realized the system would also show that he was also the company’s biggest fire starter .

Your company’s reaction to chaos is a key process necessary to maximizing your long term competiveness and productivity. One way to address chaos is to create processes for categories of chaos. Categories help keep the process manageable without having to address each specific and unique possibility.

One category should also be “other,” as the truly unexpected will happen. One example where this was successful is an organization who assigned a team member to work the “other” category, thereby sparing the rest of the team from being randomized by the unexpected.

I’ll write more on this topic in the weeks to come. For other articles, please visit my blog at http://www.tumbleroad.com/blog.

The True Cost of the Cubicle Hero

Heroes. Society loves them, honors them and exults them. Corporate offices are filled with a new breed of hero, the Cubicle Hero. These are the people who go beyond the norm and figure it out. They burn the midnight oil and they get it done. They overcome the chaos and reach the goal. All hail the hero!

However, heroes tend to overstay their welcome. In the movie, “The Dark Knight Rises”, character Harvey Dent intones, “You either die the hero or you live long enough to see yourself become the villain.” The Cubicle Hero’s individual victory is celebrated initially, but situations change and the need for the hero diminishes over time. Or so we hope.

Cubicle heroes can become process bottlenecks and productivity killers. Why? The organization’s reward structure doesn’t lead them to being mentors. The cubicle hero has great value to the organization but their way of working can’t scale and the lack of information sharing prevents the organization from truly benefiting from their victory. The hero then gets involved in every project that touches their area and becomes the bottleneck as the demand for their time is greater than what is available. Thus, the hero slowly becomes the villain, delaying projects.

Many years ago, I worked at a company where a core process of the company was dependent on a very skilled hero. He was a great employee and did his job earnestly. However, he also guarded his knowledge so that he was the only one who understood it completely. This became a serious company concern when he was involved in an accident, leaving him unable to work for several months. Several key projects were impacted.

Changing the perspective, expectations and language of what happens as part of these efforts can lead to a different outcome. We need to make it clear that we want and need Corporate Explorers rather than Cubicle Heroes. Leif Erickson, the Viking, may have been the first to reach North America on a heroic journey, but it was the explorer, Columbus, that opened up North America to the world.

Explorers and Heroes share many common traits. They can see the big picture. They can dig down into the details when needed. They put in the extra effort to get the job done. The real difference is in the aftermath. Explorers open new trails so that others may come behind them. Explorers become guides to help others make the same journey. Heroes, on the other hand, continue to hold onto their conquest.

Changing your company culture to encourage Explorers over Heroes creates a scalable culture of knowledge sharing. This organizational approach leads to greater productivity, higher quality collaboration and timelier project progress.

To summarize, I recommend reviewing the following in your organization.

  • Provide a clear path to success for as many as possible to the rewards for exceptional effort, in a way that others and ultimately the organization can leverage
  • Provide public recognition for knowledge sharing
  • Structure rewards, within the process, so we can move from the mentality of one time hero-creation to our true goal of constant productivity improvement
  • Provide the Explorer with opportunities to help facilitate and implement their achievement within the organization. This keeps the Explorer engaged and looking for additional ways to improve
  • Provide collaborative tools like Office 365 and Yammer to help facilitate and support the Explorer’s journey

If you are ready to address more productivity issues in your organization, talk to us or join our Community.

Avoiding Chaos at Light Speed

Managing multiple Project Server instances over the years has taught me that Project Management tools amplify your project process and project communications effectiveness. If your process and communication effectiveness is good, a tool will make this situation better. If there’s a communications issue or process breakdown, a tool will create “chaos at light speed”, amplifying the underlying problem.

The first step of any Project Server or Project Online implementation is to review your current communication and process framework. Using a question-centric approach like our Effective Simplicity™ approach can help guide you away from potential communication gaps and issues.

Projects are Conversations

Projects have been around for as long as humans have worked together to achieve common goals. They weren’t called projects at that time as most interactions were face to face.  The Project Management Institute (PMI) defines a project as a temporary endeavor undertaken to create a unique product, service or result. Feeding the tribe, fighting off invaders or building shelter for the family were a form of projects.

The efforts grew in size as did the number of people involved, as time passed. The need to capture the conversation between all parties became more critical. Project plans were born as a technique to keep track of the overall conversation. Thus, project plans represent the latest state of the conversation between everyone involved with the project.

The diagram below is a simplified representation of the ongoing conversations related to just one project.

Project Communications

Effective Conversations Require Commonalities

Projects represent a formal conversation that is a temporary reorganization of your work social network. There are some requirements for this conversation to happen effectively.

  • Common language
  • Defined information outcomes
  • Information cadence
  • Defined audience

Notice, there is no mention of common approach. Having a single approach for all projects is a bit like having the same logic for all software. It just doesn’t work as business needs vary. Rather, the touch points between projects should be common, allowing project plans to be customized for the given business problem but still able to share information across the portfolio.

Common Language

Common language implies that the terms being used have the same meaning across the organization. If Marketing, Customer Support and IT are using different terms for the same concept or the same terms for different concepts, a conversation breakdown is imminent. For example, a Go Live date may mean the date that the software is placed into production by IT. However, Customer Support views the Go Live as the first date in which they can start generating tickets. As these are different events, unnecessary confusion with external stakeholders is bound to occur when communicating Go Live plans.

Detailed Information Outcomes

Defined information outcomes are the questions you need to answer with the data captured in the tool. The use of questions focuses your thought process on concrete examples that are easy to communicate, easy to define what is in scope and easy to judge value from the outcome.

For example, you’ve defined an Executive audience who have the following three questions.

  • What is the total project investment for this fiscal year for each of the CEO’s strategic initiatives?
  • Which quarter will key value propositions by CEO strategic initiative be realized?
  • What is overall variance trend of our project investment from original plan?

For each of these questions, you can discuss the desired outcome with the target audience, allowing the definition of clear and concise data to be collected to answer the question. It’s also easier to track progress on the question rather than attempting to ascertain progress from a list of functional configuration steps. Questions also help drive clear implementation requirements. Once these requirements are gathered and tracked according to each question, you can make more intelligent adjustments to overall scope by excluding questions rather than blindly cutting tool scope.

To drill down, the first question will require the scheduling of project costs and assignment of project contribution to strategic initiatives. The scheduling of project costs, which can be done using Project’s cost resources, can be a significant training effort for PMs new to cost scheduling. Using the drivers in Project’s Portfolio Management functions can capture the strategic initiative contribution. Portfolio Management functionality requires good schedule and cost data in order to be effective. This may represent a larger implementation effort than you are able to take on initially. Ultimately, you can do both but now you have a clearer picture of the impact.

Information Cadence

If you’ve ever been in a conversation with your significant other and had your mind drift, you quickly discovered how  a lack of timely information can lead to a serious issue. Information cadence within the organization is about ensuring that the right data is maintained and available on a regular interval. For some organizations, that means projects are updated weekly on Fridays. For others, a monthly cadence is more appropriate. Setting an information cadence expectation ensures that everyone is listening for the same information at the same time.

For example, one client has all updates made by Friday evening as standard reports are generated on Monday morning for the project review meetings that begin on Monday afternoon. The cadence ensures these meetings have the latest information.

Defined audience

Within the project, the importance and significance of a member’s role changes over the life of the project as information needs change to achieve work.

Tools such as RACI attempt to model the project’s social interactions. Communication plans also attempt to do this, but from a different perspective. However, RACI, communication plans and other tools of that sort represent a one-time look at how the project members and stakeholders interact. These models fall down as soon as the project starts and reality takes over.

The question-centric approach maps the audience to the question, enabling you to easily monitor and manage the needs over time. If the question mix changes for a given audience, it is easy to gauge the impact and required work.

Starting Well Prevents Later Issues

Starting your project management system design, using a question-centric approach, will help you avoid later issues by ensuring you are meeting the most important needs of your audience. Clear definition of audiences and questions facilitates clear communication of value propositions. The questions also enable clear prioritization and scope control. Tool configuration becomes a means of supporting a conversation rather than being a driver of conversations. Within this framework, the result will be a much leaner Project Server implementation.

Want to know more about the Effective Simplicity™ approach? Join our community to find out about upcoming events.

What’s In Your Notebook?

A recent question on note-taking tools in the Project Managers LinkedIn group sparked quite a response.  The discussion got me thinking.  Do we ever treat note-taking as a corporate capability and are there business advantages to doing so?

Collaboration is one of those overhead tasks that is not tracked, though everyone is expected to do it. According to McKinsey and IDC, the average worker spends nearly 60 percent of their week reading email, finding information and collaborating with co-workers and other parties. That’s 24 hours of your work week!

Any reduction in collaboration time will gain the organization more time for other high value-add activities like projects. The key is to improve the situation with the smallest and simplest change possible by following the principles of Effective Simplicity™.  Effective Simplicity™ is about focusing on doing a few core actions in your organization well; those that yield the biggest organizational benefit. It’s upon this operational base that competitive advantages and real changes are built.

Today’s hot discussions are all about Enterprise Social and Collaboration, with vendors falling all over themselves to sell you the next tool.  However, there’s little energy spent on the original collaboration tool– note-taking. Why is that? Many organizations have mountains of data locked away in paper notebooks that people may refer to once and then discard. Project teams reinvent the wheel with new efforts as they have no awareness of previous discussions or access to notes on key topics.

Note-taking is one of the primary starting points of collaboration.  Let’s take a look at note taking as an opportunity to streamline data capture and information dissemination.

In this article we’ll focus on the information needs of a project team using tools you likely already have in-house. In particular, we’ll showcase the use of Microsoft technology, but the concepts may also be applied to other software platforms as well. The intent is to show how minimal tweaking of a common process can result in significant savings in an effort to collect and distribute information in a productive manner.

Why Do We Take Notes?

People take notes generally for two major reasons: capture of needed actions and capture of information for reference.  Notes taken for reference are important because according to the study above, we spend six hours a week looking for information that helps us be more organized and make decisions. Notes captured for collecting and distributing information and tasks to other members of the team and other stakeholders are equally important because they help us prioritize, assign, baseline, and just plain old communicate using a concrete method. Given this central role in collaboration, could this be a nugget to be leveraged and turned into productivity gold?  Maybe you are also shocked that it receives so little attention?

What Do We Do with the Notes?

In an organization using cutting edge, technology, the notes we place in a central knowledgebase are just a search away. Sounds wonderful does it? The reality for most organizations is that we simply aren’t there yet.  If you are still buying cases of paper pads every month and you look cross-eyed at someone who brings a computer to a meeting, you are at the beginning of this journey to a new way of working.

Notes should be all about capturing data from various perspectives and allowing analysis to make them clear and accessible to the rest of the organization. The challenge with taking paper notes is that you incur two taxes. The first is a “Time Tax”, that is, how long it is before the notes become available. In many cases, the answer is never  because the information is locked away in a long forgotten notepad under your desk. If you remember to communicate the contents,  a “Transcription Tax” is incurred to get the requisite data in electronic format. Then likely, it is emailed and only available in the inboxes of those who received the email.  This is just slightly better than a notepad under your desk.

So What Can We Do About This?

Let the Organization Know What Information is Important to Capture

To clarify, let’s take the first step by considering three sections for getting the most out of note-taking.

What constitutes great notes?

  • Actions
  • Awareness
  • Other Reference

Actions include items such as to do items for yourself and others and information gathering for prioritizing.  These items are typically very immediate in nature and aren’t put in the formal project plan.

Awareness items are data elements gathered to assist with decision-making and action-taking. These items include dependencies for current work, decision recording and alternatives considered and rejected and upcoming events which may require interaction from the team.

Lastly, there are reference items. A recent study by McKinsey showed that an average of 19% of our work week is spent searching for relevant information. Reducing the effort for this task by 30 minutes per week will give back a person-week of capacity to the company for each employee impacted.  As you know, finding information is a serious drain on productivity.

As an example from one particular project, regularly printed electronic copies of project-related invoices, trouble tickets and license agreements were placed into OneNote to provide one place to search for these items.

Pick a Note Taking Method That Fits the Organizational Culture

The second step is to decide what tools capture notes easily. If you look around the room in a meeting, what do you see? iPads? Notebook computers? Paper and pencil? Depending on the current cultural norm in your organization, you have choices.

I Love the Smell of Fresh Paper in the Morning

If your organization is paper-centric, you might consider looking at technologies like LiveScribe 3 or Capturx. Both allow you to write using a special pen and the notes can be converted over to OneNote or other repositories. This allows individuals to continue taking notes in the format with which they are familiar but also provides a way to capture the information electronically to distribute in a number of ways, reducing both Time and Transcription taxes.  Note, these products are still early in the product adoption lifecycle and prices reflect that reality.

Moving to a more consistent manner of capturing and disseminating notes will require a conversation with meeting leaders about how long you have to transcribe notes and where to put the electronic transcriptions. Setting an organizational expectation will encourage adherence to the community agreement.

Time also impacts the quality of notes.  Longer time leads to data loss. One former client decided to reserve the last 10 minutes of a meeting to finalize the notes and transcribe them into email. The idea– capture information while it is still fresh in the mind of the note taker and get them locked.  This expectation helped stress the importance of capturing meeting notes to the organization and assisted people in adopting the habit.

Paper? We Don’t Need No Stinking Paper

There are also many electronic tools available for gathering notes. OneNote, a free alternative from Microsoft and Evernote, offer many data capture features that allow access to the data from any device. They both offer cloud based storage which keeps you from losing data in the event of a theft or corruption of your device.  Lastly, they both support third party applications, which enables the applications to integrate easily with other solutions. Our technical blog,  http://aboutmsproject.com/, will have a series for Administrators on how to implement a OneNote and SharePoint based solution for sharing notes.

Electronic tools have an advantage in that once you are done typing, the notes are mostly done. However, the organizational expectation that notes will be taken and distributed still needs to be set. Otherwise, you will get inconsistent information which will increase information search time. My personal experience has shown that setting a few clear expectations gives adoption a much better chance than a Licensing document approach to implementation.

Here’s a real world example of meeting note guidelines.

  • Always take meeting notes.
  • Include the agenda and who attended
  • Post the notes within four hours of when they are taken.
  • Capture actions, awareness items and any reference information relevant to the project
  • Put the notes here: [Provide the location]

This list has been a great springboard for new behavior.  By communicating to everyone that notes were expected within the same day and that certain bits of information, like an agenda, attendees, actions, awareness items and reference items should be in them, the organization was able to adopt the new process.  Of course, time must be allocated for this activity.  Also, a location was provided to ensure there was no excuse for “not knowing.”

As a PM, the other technique I used was to refer to the meeting notes when we had status meetings. If you didn’t do your notes in a timely fashion, your peers would let you know this outcome wasn’t acceptable.

Adopt a Consistent Manner to Distribute Notes

The final decision is how to best distribute either the notes themselves or notification that notes were posted.  Picking a narrow set of options makes it easier for people to develop the habit of both sending and expecting notes.  The problem really comes down to using multiple delivery channels creates challenges for the reader to consistently find the information. If you find yourself having this sort of internal dialog, “Did I get that in email or is it in Yammer or did he put that on the Team Site,” you may have a dissemination choice problem. Reducing the options to an agreed upon select few will reduce search time.

Small Changes Can Have Large Impacts

Who knew note-taking could affect so much?  Give these recommendations with your team.  Making three simple changes in your processes can help get your team and organization get back valuable time.

If you need further examples to substantiate taking action or would like to speak to someone that has tried similar techniques, contact us via this Join our Community link and indicate your interest.  Tumble Road believes in knowledge sharing that helps the overall community.  We’ll be glad to help get you to that day when you can search the Corporate Knowledgebase for all of the information you need for your job.