Analyzing Office 365 Photo EXIF data using Power BI

In an earlier post, I wrote about how to mine the OCR data from Office 365 document folders using Power BI. During the research for that post I discovered that the photo EXIF data is also available for analysis from Office 365. I’ll show you how to get to it in this post.

What is EXIF data anyway?

Think of EXIF as special metadata for photos and with today’s cameras, records the settings of the camera as well as the location it was taken, if the camera has GPS. You can read more about the format here: https://photographylife.com/what-is-exif-data  Many smart phones today automatically include the EXIF data when you take photos with your phone. This data is automatically extracted by Office 365 from photos when uploaded to a document library.

Why do I care about this?

Imagine you work for a company that does inspections. Your inspectors use their phone to take photos of issues already. Wouldn’t it be great to show the photos by location and related location data together on a report page? This technique allows you to easily mine that EXIF data.

How do I get to it?

First, upload your images to a SharePoint document folder in Office 365 and the OCR process will initiate  automatically. I’ve had it take up to 15 minutes to process so you may need to be patient. You can do this via SharePoint Mobile app if you are uploading mobile photos.

Second, from Power BI desktop, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the correct column that contains the EXIF data. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

In this video, I’ll show you how to access the EXIF data and what you can do with the data.

Here’s the Power Query M Code

let
Source = SharePoint.Tables(“https://YourInstance/YourSite”, [ApiVersion = 15]),
#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″ = Source{[Id=”88888888-8888-8888-8888-888888888888″]}[Items],
#”Renamed Columns” = Table.RenameColumns(#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″,{{“ID”, “ID.1”}}),
#”Filtered Rows” = Table.SelectRows(#”Renamed Columns”, each [MediaServiceAutoTags] <> null and [Created] >= #datetime(2019, 4, 19, 0, 0, 0)),
#”Expanded FirstUniqueAncestorSecurableObject” = Table.ExpandRecordColumn(#”Filtered Rows”, “FirstUniqueAncestorSecurableObject”, {“Url”}, {“FirstUniqueAncestorSecurableObject.Url”}),
#”Removed Other Columns” = Table.SelectColumns(#”Expanded FirstUniqueAncestorSecurableObject”,{“FirstUniqueAncestorSecurableObject.Url”, “File”, “Properties”}),
#”Expanded File” = Table.ExpandRecordColumn(#”Removed Other Columns”, “File”, {“Name”, “ServerRelativeUrl”, “TimeCreated”, “TimeLastModified”}, {“File.Name”, “File.ServerRelativeUrl”, “File.TimeCreated”, “File.TimeLastModified”}),
#”Merged Columns” = Table.CombineColumns(#”Expanded File”,{“FirstUniqueAncestorSecurableObject.Url”, “File.ServerRelativeUrl”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”Picture URL”),
#”Expanded Properties” = Table.ExpandRecordColumn(#”Merged Columns”, “Properties”, {“vti_mediaservicemetadata”}, {“Properties.vti_mediaservicemetadata”}),
#”Parsed JSON” = Table.TransformColumns(#”Expanded Properties”,{{“Properties.vti_mediaservicemetadata”, Json.Document}}),
#”Expanded Properties.vti_mediaservicemetadata” = Table.ExpandRecordColumn(#”Parsed JSON”, “Properties.vti_mediaservicemetadata”, {“address”, “location”}, {“Properties.vti_mediaservicemetadata.address”, “Properties.vti_mediaservicemetadata.location”}),
#”Expanded Properties.vti_mediaservicemetadata.location” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata”, “Properties.vti_mediaservicemetadata.location”, {“altitude”, “latitude”, “longitude”}, {“Properties.vti_mediaservicemetadata.location.altitude”, “Properties.vti_mediaservicemetadata.location.latitude”, “Properties.vti_mediaservicemetadata.location.longitude”}),
#”Expanded Properties.vti_mediaservicemetadata.address” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata.location”, “Properties.vti_mediaservicemetadata.address”, {“City”, “State”, “Country”, “EnglishAddress”}, {“Properties.vti_mediaservicemetadata.address.City”, “Properties.vti_mediaservicemetadata.address.State”, “Properties.vti_mediaservicemetadata.address.Country”, “Properties.vti_mediaservicemetadata.address.EnglishAddress”}),
#”Changed Type” = Table.TransformColumnTypes(#”Expanded Properties.vti_mediaservicemetadata.address”,{{“File.Name”, type text}, {“Picture URL”, type text}, {“File.TimeCreated”, type datetime}, {“File.TimeLastModified”, type datetime}, {“Properties.vti_mediaservicemetadata.address.City”, type text}, {“Properties.vti_mediaservicemetadata.address.State”, type text}, {“Properties.vti_mediaservicemetadata.address.Country”, type text}, {“Properties.vti_mediaservicemetadata.address.EnglishAddress”, type text}, {“Properties.vti_mediaservicemetadata.location.altitude”, type number}, {“Properties.vti_mediaservicemetadata.location.latitude”, type number}, {“Properties.vti_mediaservicemetadata.location.longitude”, type number}})
in
#”Changed Type”

Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount. https://sharepointna.com

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action. https://techcommunity.microsoft.com/t5/Microsoft-Search-Blog/Find-what-you-want-discover-what-you-need-with-personalized/ba-p/109590

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:

Term

Meaning

Term

Meaning

Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post (https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi/) and this post (https://blog.crossjoin.co.uk/2017/12/14/removing-punctuation-from-text-with-the-text-select-m-function-in-power-bi-power-query-excel-gettransform/ ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

The Truth Shall Make You Miserable

Lack of Faith - Vader- Project Dashboards

When companies begin making their data more accessible via Self-Serve Power BI, they soon reach a critical break point in those efforts. The Project dashboards tell them something that isn’t pleasant or doesn’t match the narrative been publicized.

The Reality in Your Project Dashboards

Performance indicators go red. The data shows the stellar progress that was planned isn’t happening. Operational demands for time are much higher in reality than assumed in planning. In short, it shows the harsh reality, as captured in the data.

This is a moment of truth for organizations. Are we going to embrace the transparency or will we attempt to control the narrative?

Data Quality Challenges

The first question is normally, is this data accurate? This is quite reasonable to ask, especially at the beginning the data stream may not be as clean as it should be.

The approach to this answer can decide your success going forward. For some, questioning the data is a prelude to dismissing the use of the data. For others, it’s a starting point for improvement.

The data deniers will provide many reasons why “we can’t use the data.” They will complain that the data is inaccurate or incomplete. Therefore, they can’t trust their data to integrate its use into their daily work or to use it to make decisions.

These data deniers may have other hidden reasons for their position, such as political or power base protection reasons. Moving to data-centric culture is a big change for many organizations, as you have to be open about your failures. No company is always above average in every endeavor.

Data deniers also fear how business intelligence might impact their careers. If the corporate culture is such where punishment is meted out when the numbers and updates aren’t desirable, likely data transparency won’t be welcome.

Change the Focus of How Data is Used to Succeed

The key to overcoming the data fear is to change the intent for its use, moving the focus from punishment to improvement.

For the successful companies using data, they embrace two simple facts. One, the data is never perfect and that it doesn’t have to be to effect a positive change. Two, they’ve defined the level of granularity needed in the data to be used successfully.

How Imprecise Data is Changing the World

We see this approach in our personal lives. For example, the Fitbit device is not 100% accurate or precise. Yet, millions are changing their behavior of being more active because of the feedback that it provides. based on relatively decent data. You may also be carrying a smart phone, which also tracks your steps. Between the two, you would have a generally good idea of how many steps you took today.

From a granularity approach, we aren’t generally worried about whether I took 4103 steps or 4107 steps today. We took 4100 steps. Hundreds is our minimum granularity. It could easily be at the thousands level, as long as that granularity meets your information needs.

Cost Benefit of a Minimum Level of Granularity

One area we see this type of data accuracy dispute in the corporate world is with cost data. It’s been engrained in our psyche that we have to balance to the penny. Our default data granularity is set to the cent.

While that may improve accuracy and precision, it doesn’t make a material difference in the impact. For example, if your average project budget is $2M, then worrying about a 5 cent variance is a percentage variance of 0.0000025%. I’ve seen organizations who get wrapped up in balancing to the penny and waste an inordinate amount of time each week getting there.

Instead, let’s define a minimum granularity in the data such that a 1% variance is visible. For a $2M average, you would round up at the $10,000 point. Doing so then reduces work attempting to make the data perfect. Any variances of that size are significant enough to warrant attention and are more likely to stand out.

Implementing Self-Server BI using products like Microsoft Power BI and Marquee™ Project Dashboards will enable your organization to gain great improvements as long as they are willing to accept the assumptions above. The truth may make you miserable in the short term as you address underlying data and process challenges. In the long run, you and your company will be better served.

Please share your experiences in the comments below.

3 Ways to Rev Your Microsoft Project Online and Power BI Performance (Number 2 Is Our Favorite)

Are you having tired of waiting for long refresh times with Power BI? Perhaps, you are getting timeouts because you are retrieving too much data. There’s three easy ways to avoid these issues.

WHY IS MY POWER BI DATA REFRESH SO SLOW?

Many Power BI models refresh slowly as the models are not structured to take advantage of query folding.

Query folding is a process by which specific Power BI transformations are executed by the data source instead of Power BI. Allowing those actions to occur at the source means less data has to be sent to Power BI. Less data means faster data refresh times and smaller report models. Note, not all data sources support query folding, but oData for Project Online and SQL Server for Project Server do.

A sample of these foldable actions include

  • Column or row filtering
  • Group by and aggregation
  • Joins
  • Numeric calculations

For example, you need to find out the amount of work scheduled by week for the quarter. You are querying the time phased data from Project Online. If you aren’t careful, you may be retrieving hundreds of thousands of records. Query folding will make a huge difference in speed and the number of records retrieved. If you have the records filtered by Project Online before retrieving them, you may only receive a few thousand records instead.

ISSUE #1: NOT REFERENCING THE DATA SOURCE PROPERLY

This issue occurs primarily using oData sources, such as Project Online, Office 36 and other web based sources. Query folding breaks if you don’t properly reference the oData feed.

In order for query folding to work properly, the transformations that have folding support need to be the first things executed. If a non-folding transformation is added, no subsequent transformation will be folded. If you don’t reference the oData feed properly, the step to navigate to the oData feed isn’t foldable, therefore blocking all subsequent transformations from being folded.

ISSUE #2: DOING DYNAMIC DATE SELECTION IMPROPERLY

This issue causes more headaches with Project’s time phased data than any other action. In many cases, you are looking to create a range of dates for Power BI to query. If you use the Between filter in the Power BI Query Editor, there’s not option for Today’s date +/- a number of days. If you use many of the other Date/Time filters, like Is Latest, you still seem to pull back a lot of data.

Solving this particular issue requires getting familiar with the M query language so that you can understand how Power BI Desktop performs specific actions.

For example, let’s look at the Is Latest date selection. By default, your dataset is unsorted so Power BI creates a list of the values in the selected column and does a Max value. While this is correct, this could result in a lot of records being retrieved to perform a Max and get one record. The M code over a time phased table looks like this:

= Table.SelectRows(#”Filtered Rows”, let latest = List.Max(#”Filtered Rows”[TimeByDay]) in each [TimeByDay] = latest)

To get much better performance, make the following two changes. First, sort the dataset in a descending manner on the column which you are using to decide latest. In this example, that’s TimeByDay. Sorting is a foldable action so the data source will do the work.

Next, change List.Max to List.First. Since the latest date is the first record in the sorted dataset, a lot less data is required to get the answer. So, my statement is now = Table.SelectRows(#”Filtered Rows”, let latest = List.First(#”Filtered Rows”[TimeByDay]) in each [TimeByDay] = latest)

In testing, the original way required over a 1 Mb of data to be retrieved to answer the question. The new way only retrieves 127 Kb.

ISSUE #3: URLS ARE TOO LONG

This issue comes into play when working with Project Online and SharePoint based data sources. Project, in particular, has a lot of long field names. SharePoint has a limit of 2048 characters for a URL. If you are manually selecting a lot of field names, you can accidentally go past the 2048 character limit.

What this means is that Power BI receives a URL that can’t be acted upon. The default Power BI behavior is to then simply retrieve the feed without modifiers. If this happens on a source with many rows, this could significantly impact performance.

In this case, you would break up your dataset into two or more to fit within the limitation and then merge them back together in Power BI.

WANT TO KNOW MORE?

Join us for a free Power BI performance clinic on March 22! This session will do a deep dive into these issues as well as show you how to determine when folding is happening (and not.) Go here to register!

 

[imageframe lightbox=”no” lightbox_image=”” style_type=”none” hover_type=”none” bordercolor=”” bordersize=”0px” borderradius=”0″ stylecolor=”” align=”none” link=”http://academy.tumbleroad.com/courses/power-bi-performance-clinic” linktarget=”_blank” animation_type=”0″ animation_direction=”down” animation_speed=”0.1″ animation_offset=”” class=”” id=””][/imageframe]

Microsoft Project Online Timesheet Reporting using Power BI Desktop

PowerBI Timesheet Dashboard

Visualizing Project Time

Project Online supports collaborative work management across your organization. It provides the ability to manage multiple project schedules and allows project team members to enter the time spent on assigned tasks via their timesheet. The timesheet information can then be used for many purposes.

Many Project Management Offices (PMOs) have dynamic information needs that require data interactivity to uncover the sought out answer. Power BI Desktop is the perfect tool to enable the data exploration process. In this article, we’ll focus on the PMO need to audit timesheet data, exploring who has submitted time and what types of time were submitted. Below is an example of an overall timesheet status dashboard, created in Power BI.

 

For those unfamiliar with Project, timesheet submission is generally a weekly process. The Project Manager publishes the latest project version. At the end of the week, project team members log how much time they spent on assigned tasks and submit their timesheet for approval. Project Managers then review and approve the timesheet submissions. Once this process is complete, status reports and other key data visualizations use this updated timesheet information to support organizational needs.

Figure 1 Weekly Timesheet Process

Success takes the whole village

Getting everyone to submit their timesheet in a timely manner is key to getting full value from time tracking. Thus, auditing timesheet submissions is an integral part of a successful time tracking implementation.

In a perfect world, everyone would submit their timesheet on time. The team member would get value from the data that supports their work. The organization would also get value from having data in a timely manner with minimal capture effort.

The village faces challenges

Many organizations face three time tracking challenges.

First, it’s difficult to get the entire organization to fill out their timesheet in a timely manner. Sometimes, you aren’t in the office on Friday so the timesheet doesn’t get submitted. Some people will simply forget to submit their time. Others may be more passive aggressive in that they view time tracking as a tax with little individual benefit.

Second, Project has no included functionality for easily auditing timesheet submission. If you want to use time tracking, you have to build your own business intelligence views to meet your analytic needs.

Lastly, reporting tools of years past focused on solving a particular business problem. They were not optimized for ad hoc querying to meet immediate and specific data needs. Below is an example of a dashboard for investigating where timesheets have gone missing.

PowerBI Timesheet Investigate Dashboard

 

This article will take you through the process of defining your business need, show you how to leverage a social design approach to design the solution and take you through the steps to address the need in Power BI.

Designing for Business Intelligence

Business intelligence answers the questions that support specific organizational conversations. Experience in BI Design indicates that mapping the social interactions first, will increase the chances that the designed visualizations will meet the business need.

I use Tumble Road’s Conversation-Centric Design (CCD) approach for designing Business Intelligence solutions. CCD helps you derive the technical design details by starting with the modeling of the targeted social interactions. This process will identify and define three key aspects: the audience, the conversations and the key questions. Once these aspects are defined, you can use these to define the data scenarios, individual data elements, common user terminology, and supporting processes.

Figure 2 Conversation-Centric Design Process

CCD can also be used as a scoping framework. For example, you can choose not to support a conversation and be able to track all aspects impacted by that decision. It also makes it easier to create a project Work Breakdown Structure based on Audience:Conversation:Key Questions structure, allowing the rollup of task progress to provide the health of each aspect automatically.

Requirements and Design

For this exercise, I’m limiting the design to one primary audience, the Project Management Office, and three key questions for the PMO.

Requirements

User Scenario

The previous week’s timesheet results for projects and administrative time are reviewed in the Monday morning PMO team meeting. Claire, the director, leads the meeting, where she reviews the results with Stephanie and Ian. The data is reviewed by open timesheet period. The outcome of the meeting is to follow up with the individual team managers to complete the process. Claire also uses the data to generate PMO reports to the company vice presidents. Therefore, she wants to have complete data as much as possible to generate the reports.

The organization uses administrative time categories so all timesheet eligible resources are expected to turn in a timesheet weekly even when the person is not currently assigned to a project. The PMO is only concerned with total time charged to a project. Only one time category is generally received, Actual Work Billable, but due to an intern incident earlier in the year, all Actuals will be summed together to ensure all time is captured. The timesheet periods are defined as running from Sunday to Saturday each week.

Jason and Sophie need the timesheet data for their own uses with Finance and HR but are passive observers to the review process.

pmo

Figure 3 Requirements in CCD format

Conversation

The business intelligence visualization is needed to support the PMO weekly review of timesheet data at 9AM on Mondays.

Audience

The PMO consists of five people, one director and four analysts. Claire, Stephanie and Ian are direct consumers of the data and are the primary audience of this exercise. Jason and Sophie are indirect consumers of the data. Claire, Stephanie and Ian, therefore, will be best choices for assessing whether the visualization meets their needs. Stephanie and Ian can also verify the data.

Key Questions

From the scenario above, we can derive the following key questions. NOTE: There are typically more than three but I find three is a great starting point.

  • Have all timesheet eligible resources submitted their timesheet?
  • Have all submitted timesheets been approved?
  • Which teams have outstanding timesheets?

Technical Design

Supporting Data

The supporting data will denote three states for timesheet eligible resources. NOTE: Project refers to people as resources. Therefore, resources and project team members are interchangeable for this article.

  • Have not created a timesheet
  • Submitted a timesheet, but timesheet is still in process
  • Timesheet was submitted and approved

The data set will be aggregated to the weekly level of granularity by resource and project. Only the last 60 days of data will be reviewed.

Data Sets

The primary data sets necessary to create the requisite visualizations are as follows.

Timesheet eligible resources are defined as work resources in Project that can log into the system, have an email address and are capable of working in the analyzed time frame.

  • Identify and retrieve timesheet eligible resources using the following conditions
    • Work resources – Resource Type is 2
    • Resource is active in the system – ResourceIsActive = 1
    • Who can log into the system – ResourceNTAccount is not null
    • Are capable of working in the analyzed time frame – Resource Base Capacity is greater than zero.
  • Identify and retrieve the submitted timesheets with associated project information for the last 60 days for timesheet periods that are open.
    • Timesheet Period Start Date is within the last 60 days.
  • Identify all timesheet eligible resources who did not create a timesheet
    • This data will need to be derived using the other two data sets.

Visualizations

In order to answer the key questions, a timesheet status field is needed to hold the state. The following visualizations are needed to show the core data.

  • Listing of each resource by timesheet period with the current timesheet status submission
  • Stacked bar chart by timesheet period with a count of timesheets in each timesheet status
    • Provides a visual indication of resource level submission completeness.
  • Stacked bar chart by team with a count of timesheets in each timesheet status
    • Provides a visual indication of team level submission completeness.
  • Stacked bar chart by resource with a count of timesheets in each timesheet status
    • Provides a visual indication of resource level submission completeness.
  • Amount of time logged per project per timesheet period
  • Slicers will be required for timesheet period, team and timesheet status.

Data Elements for Visualization

This is the list of fields needed to generate the final visualizations. Other data will be needed in the process to generate the underlying data sets.

Project OData feed Field Definition Common Terminology
Resources ResourceDepartment This is the department to which the resource is a member Department
Resources Resource Email Address Work resource email address Email
TimesheetLine PeriodEndDate End date of timesheet period End
TimesheetLine ProjectID Project ID for linking to other project information Hidden from user
Resources ResourceID Resource ID for linking to other resource information Hidden from user
TimesheetLine TimesheetApproverResourceName The person responsible for approving the person’s timesheet Manager
Resources Resource Name Name of work resource Name
Calculated Index Needed to ensure unique key People
TimesheetLine TimesheetPeriodName In this example, the period name follows this pattern: 2015-W01. Timesheet periods are 7 days long. Period
TimesheetLine TimesheetPeriodStatus Open or Closed to timesheet submission Period Status
TimesheetLine PlannedWork Amount of time originally planned for the work Planned
TimesheetLine ProjectName Name of the project to which time is billed Project
TimesheetLine PeriodStartDate Start date of timesheet period Start
TimesheetLine TimesheetStatus The process status for a specific timesheet Status
Resources TeamName This is the team assignment pool to which a resource may be assigned Team
TimesheetLine [ActualOvertimeWorkBillable] +[ActualOvertimeWorkNonBillable] +[ActualWorkBillable] +[ActualWorkNonBillable] Amount of time entered against a project Time
TimesheetLine TimesheetPeriodName In this example, the period name follows this pattern: 2015-W01. Timesheet periods are 7 days long. Year

Supporting Processes

  • Weekly Time tracking process

Tools

  • Power BI Desktop
  • PowerBI.com

Building the Data

Data Feeds

There are four OData feeds necessary to build the timesheet audit report.

Resources

The resources feed provides the data that is visible via the Resource Center in Project Web App. This feed provides access to all resource custom fields.

ResourceTimephasedDataSet

This feed provides access to the resource capacity data at a day level of granularity. The Max Units field in the Resource impacts the capacity available by day. If Max Units is set in PWA, all days will have the same capacity. Project Pro provides the additional capability to create different Max Units for specific date ranges.

TimesheetPeriods

This feed provides access to the date ranges per week for which timesheets are expected.

TimesheetLines

This feed provides the summary data as to what time was logged by the project team member.

Transformations

In order to build out the dataset for the visualization, the following transformations are required.

  1. Identify and retrieve timesheet eligible resources using the following conditions
    1. Work resources – Resource Type is 2
    2. Who can log into the system – ResourceNTAccount is not null
    3. The result of this step is the potential list of resources who can submit a timesheet
    4. NOTE: Typically, you could use ResourceIsActive for steps 2 and 3. Be aware that the value changes if you turn on the new Resource Engagements feature. Feel free to change this in the code based on your individual configuration.
  2. Join the Resources with the Resource Capacity records to determine if a person worked for a specific week
    1. Are capable of working in the analyzed time frame – Resource Base Capacity is greater than zero.
    2. The result of this step is the actual list of resources expected to submit a timesheet for the selected timesheet periods
  3. Create a record for each timesheet eligible resource in step 2 for each timesheet period within the 60 day range. This is similar to a cross join in SQL.
    1. This is required as no timesheet record is created until the resource accesses the timesheet. Therefore, if resource never clicks the timesheet link, there’s no way to directly see that the timesheet wasn’t created.
    2. The result of this step is the master list of all timesheets expected for the selected timesheet periods.
  4. Join the submitted timesheets with the master timesheet list from step 3.
    1. The result of this step is the status of all timesheet submissions and non-submissions.

dataaccess

Figure 4 Order in which to perform the data transformations

Power BI Transformation Code

Resources

This M code calls the Resources OData feed and filters the records for only those defined as timesheet eligible. The columns are then limited to only those necessary to support the end visualizations.

let
Source = OData.Feed(“https://YOURSERVERNAME.sharepoint.com/sites/YOURPROJECTWEBAPP/_api/ProjectData”),
Resources_table = Source{[Name=”Resources”,Signature=”table”]}[Data],
#”Filter for Work Resources who are Users” = Table.SelectRows(Resources_table, each ([ResourceType] = 2) and ([ResourceIsActive] = 1) and ([ResourceNTAccount] <> null) and ([ResourceEmailAddress] <> null)),
#”Get core fields” = Table.SelectColumns(#”Filter for Work Resources who are Users”,{“ResourceId”, “ResourceCreatedDate”, “ResourceEmailAddress”, “ResourceName”, “TeamName”, “ResourceDepartments”})
in
#”Get core fields”

ResourceTimephasedDataSet

One of the requirements was to only expect a timesheet when the resource had available capacity. This requires a look at the data in a timephased manner.

This M code calls the ResourceTimephasedDataset OData feed and performs the following transformations. Descriptive comments for the statement following are denoted with //. Also, each step has been renamed to provide clues as to its function, making the process somewhat self-documenting.

let
Source = OData.Feed(“https://YOURSERVERNAME.sharepoint.com/sites/YOURPROJECTWEBAPP/_api/ProjectData “),
ResourceTimephasedDataSet_table = Source{[Name=”ResourceTimephasedDataSet”,Signature=”table”]}[Data],

// Timephased datasets are usually huge, only get the last 60 days.
#”Filter for Last 60 Days” = Table.SelectRows(ResourceTimephasedDataSet_table, each Date.IsInPreviousNDays([TimeByDay], 60)),

//Since we only care about resources who could have worked, we want to

//examine Basecapacity for any instance greater than zero.

//Basecapacity is calculated using the resource’s MaxUnits value
#”Filter for BaseCapacity > 0″ = Table.SelectRows(#”Filter for Last 60 Days”, each [BaseCapacity] > 0),

//Need to group data by week, so insert new column for week of year
#”Inserted Week of Year” = Table.AddColumn(#”Filter for BaseCapacity > 0″, “WeekOfYear”, each Date.WeekOfYear([TimeByDay]), type number),

//Once grouped, join with Resources feed from previous

#”Merge with Resources” = Table.NestedJoin(#”Inserted Week of Year”,{“ResourceId”},Resources,{“ResourceId”},”NewColumn”,JoinKind.Inner),

// We need to expose the resource created date so that we can

// eliminate any capacity prior to the date of resource creation
#”Expand to see Resource Fields” = Table.ExpandTableColumn(#”Merge with Resources”, “NewColumn”, {“ResourceCreatedDate”}, {“ResourceCreatedDate”}),

// Create a new column that uses Value.Compare to flag the nature

// of the relationship. A value of 1 means TimeByDay > Created Date
#”Flag Valid Capacity” = Table.AddColumn(#”Expand to see Resource Fields”, “ValidCapacity”, each Value.Compare([TimeByDay],[ResourceCreatedDate])),

// Eliminate any rows which don’t have a 1, which are all of the rows

// that exist before the resource was created
#”Filter for Valid Capacity Only” = Table.SelectRows(#”Flag Valid Capacity”, each ([ValidCapacity] = 1)),

// Group the result set by week number
#”Group by ResourceID_WeekNumber” = Table.Group(#”Filter for Valid Capacity Only”, {“ResourceId”, “WeekOfYear”}, {{“Count”, each Table.RowCount(_), type number}, {“Capacity”, each List.Sum([BaseCapacity]), type number}, {“Startd”, each List.Min([TimeByDay]), type datetime}})
in
#”Group by ResourceID_WeekNumber”

Timesheet Lines

This M code retrieves the time logged against the individual projects.

let
Source = OData.Feed(“https://YOURSERVERNAME.sharepoint.com/sites/YOURPROJECTWEBAPP/_api/ProjectData”),
TimesheetLines_table = Source{[Name=”TimesheetLines”,Signature=”table”]}[Data],
#”Removed Other Columns” = Table.SelectColumns(TimesheetLines_table,{“TimesheetLineId”, “ActualOvertimeWorkBillable”, “ActualOvertimeWorkNonBillable”, “ActualWorkBillable”, “ActualWorkNonBillable”, “CreatedDate”, “PeriodEndDate”, “PeriodStartDate”, “PlannedWork”, “ProjectId”, “ProjectName”, “TimesheetApproverResourceId”, “TimesheetApproverResourceName”, “TimesheetName”, “TimesheetOwner”, “TimesheetOwnerId”, “TimesheetPeriodId”, “TimesheetPeriodName”, “TimesheetPeriodStatus”, “TimesheetPeriodStatusId”, “TimesheetStatus”, “TimesheetStatusId”}),

// Only get the last 60 days of timesheet lines
#”Filtered Rows” = Table.SelectRows(#”Removed Other Columns”, each Date.IsInPreviousNDays([PeriodStartDate], 60))
in
#”Filtered Rows”

Timesheet Periods (Renamed to Timesheet Analytics)

This becomes the primary dataset for user facing visualizations. Everything is predicated on timesheet period so it makes sense to orchestrate the rest of the data around this data.

let
Source = OData.Feed(“https://YOURSERVERNAME.sharepoint.com/sites/YOURPROJECTWEBAPP/_api/ProjectData”),
TimesheetPeriods_table = Source{[Name=”TimesheetPeriods”,Signature=”table”]}[Data],

// Filter records for last 60 days
#”Filter for last 60 days” = Table.SelectRows(TimesheetPeriods_table, each Date.IsInPreviousNDays([StartDate], 60)),

// Cross Join of Timesheet Period and Timesheet Eligible Resources

// Creates a master list of all expected timesheets
#”Added Resource for every Period” = Table.AddColumn(#”Filter for last 60 days”, “AllResources”, each #”Resources”),

// Show all relevant resource fields
#”Expand to show Resource Fields” = Table.ExpandTableColumn(#”Added Resource for every Period”, “AllResources”, {“ResourceId”, “ResourceCreatedDate”, “ResourceEmailAddress”, “ResourceName”, “TeamName”, “ResourceDepartments”}, {“ResourceId”, “ResourceCreatedDate”, “ResourceEmailAddress”, “ResourceName”, “TeamName”, “ResourceDepartments”}),

// Left join to match submitted timesheets with

// master list of expected timesheets
#”Merge with TimesheetLines” = Table.NestedJoin(#”Expand to show Resource Fields”,{“PeriodId”, “ResourceId”},TimesheetLines,{“TimesheetPeriodId”, “TimesheetOwnerId”},”NewColumn”,JoinKind.LeftOuter),

// Show the timesheet status
#”Show Timesheet Status” = Table.ExpandTableColumn(#”Merge with TimesheetLines”, “NewColumn”, {“ActualOvertimeWorkBillable”, “ActualOvertimeWorkNonBillable”, “ActualWorkBillable”, “ActualWorkNonBillable”, “PlannedWork”, “ProjectId”, “ProjectName”, “TimesheetApproverResourceId”, “TimesheetApproverResourceName”, “TimesheetStatus”}, {“ActualOvertimeWorkBillable”, “ActualOvertimeWorkNonBillable”, “ActualWorkBillable”, “ActualWorkNonBillable”, “PlannedWork”, “ProjectId”, “ProjectName”, “TimesheetApproverResourceId”, “TimesheetApproverResourceName”, “TimesheetStatus”}),

// If timesheet status is null, there’s no matching timesheet lines

// Therefore, the timesheet must not have been created
#”Replace Timesheet Status null with Not Created” = Table.ReplaceValue(#”Show Timesheet Status”,null,”Not Created”,Replacer.ReplaceValue,{“TimesheetStatus”}),

// Join with resources that were working during this time
#”Merge with ResourceTimephasedData” = Table.NestedJoin(#”Replace Timesheet Status null with Not Created”,{“ResourceId”},ResourceTimephasedDataSet,{“ResourceId”},”NewColumn”,JoinKind.Inner),

// Per requirement, combine all actuals columns into one
#”Total up all actuals” = Table.AddColumn(#”Merge with ResourceTimephasedData”, “Time”, each [ActualOvertimeWorkBillable]+[ActualOvertimeWorkNonBillable]+[ActualWorkBillable]+[ActualWorkNonBillable]),

// For missing timesheets, replace nulls with zero
#”Replace Time nulls with zeros” = Table.ReplaceValue(#”Total up all actuals”,null,0,Replacer.ReplaceValue,{“Time”}),

// Rename columns to be aligned with user terminology

#”Renamed Columns” = Table.RenameColumns(#”Replace Time nulls with zeros”,{{“Description”, “Period Status”}, {“EndDate”, “End”}, {“PeriodName”, “Period”}, {“StartDate”, “Start”}, {“ResourceEmailAddress”, “Email”}, {“ResourceName”, “Name”}, {“TeamName”, “Team”}, {“ResourceDepartments”, “Department”}, {“PlannedWork”, “Planned”}, {“ProjectName”, “Project”}, {“TimesheetApproverResourceName”, “Manager”}, {“TimesheetStatus”, “Status”}}),

// Narrow down data set to only those field required

#”Removed Other Columns” = Table.SelectColumns(#”Renamed Columns”,{“Period Status”, “End”, “Period”, “Start”, “Email”, “Name”, “Team”, “Department”, “Planned”, “Project”, “Manager”, “Status”, “Time”}),

// Change dates to only show date, not datetime

#”Changed dates to only dates” = Table.TransformColumnTypes(#”Removed Other Columns”,{{“End”, type date}, {“Start”, type date}}),

// Add an unique index that will provide additional

// functionality in Q&A

#”Added Index” = Table.AddIndexColumn(#”Changed dates to only dates”, “Index”, 1, 1),

// Rename the index to use the People term

#”Renamed Index for Q and A support” = Table.RenameColumns(#”Added Index”,{{“Index”, “People”}})

in

#”Renamed Index for Q and A support”

Want More? Let us teach you Power BI.

‘); // ]]>

Chaos Management and the Cubicle Hero

When asked, “What do you want to be when you grow up?” you may have replied, “Firefighter.” If you did, I’m sure you meant one of the awesome individuals who provide medical services, rescues and ride the fire trucks. While, most of us never realized that dream, there are days at the office where you probably feel that “Fire Fighter” should be your job title.

Welcome to the wonderful world of the Cubicle Hero, where fighting fires is part of your job!

Perhaps you ask yourself at the end of each day, “How did I get here?” Many feel stuck in these roles without a way out and are puzzled as to how it happened. I talked about the True Cost of the Cubicle Hero in this previous article, so let’s look at how Cubicle Heroes form.

One reason Cubicle Heroes arise is due to a work environment that isn’t structured to respond well to chaos. If there are no processes for reacting to chaos in a controlled manner, the result is a crisis, which requires some brave person to step in to address. This person is caught in that role going forward, thus evolving into the Cubicle Hero. Chaos is ever present and needed for the organization to evolve and remain competitive. The organization is going to run out of Heroes unless a systemic way of reacting is created.

Internal efforts such as implementing a new HR system creates short-term chaos and long-term impact the organization. If your organization doesn’t have a formal project transition process to production, Cubicle Heroes usually form from the project’s team members who hold the detailed knowledge about the project’s deliverables . A problem related to the project arises. This leads to a project team member solving the issue and then becoming the Hero going forward.

Ad hoc project transformation process creates “human hard drives” out of the project team members, where they must store and retrieve organizational knowledge as needed. This restricts the ability of team members to grow their skills as letting go of that knowledge results in a loss to the organization. A formal transformation process ensures relevant information is captured so that it can be widely used within the organization, freeing the team members to move on.

External events such as a large client with a new, immediate need or a viral photo of a dress of indeterminate color are also chaos sources. Does your company treat these requests as fire drills  or do they have a way to manage them?

The best companies have a deep respect for chaos and put practices in place to manage it and to learn from it. New products and services are sometimes rooted in chaos learnings. Successful chaos management becomes a source of positive change within an organization, as it provides opportunities for people to learn new skills and encounter new situations. As discussed in the earlier article, these new skills and experiences prepare these individuals to be the Explorers that we need.

If your company grows Cubicle Heroes, then the first step in the solution is to address the underlying cultural issues. Adding tools too soon will simply result in chaos at light speed. Addressing this issue is especially problematic in organizations where management has built their careers on their firefighting abilities. Cubicle Heroes tend to prosper in environments which lack visibility into cause and effect. One of my Project Management Office  tool implementations came to a grinding halt when the sponsor, who was a master Cubicle Hero, realized the system would also show that he was also the company’s biggest fire starter .

Your company’s reaction to chaos is a key process necessary to maximizing your long term competiveness and productivity. One way to address chaos is to create processes for categories of chaos. Categories help keep the process manageable without having to address each specific and unique possibility.

One category should also be “other,” as the truly unexpected will happen. One example where this was successful is an organization who assigned a team member to work the “other” category, thereby sparing the rest of the team from being randomized by the unexpected.

I’ll write more on this topic in the weeks to come. For other articles, please visit my blog at http://www.tumbleroad.com/blog.

The True Cost of the Cubicle Hero

Heroes. Society loves them, honors them and exults them. Corporate offices are filled with a new breed of hero, the Cubicle Hero. These are the people who go beyond the norm and figure it out. They burn the midnight oil and they get it done. They overcome the chaos and reach the goal. All hail the hero!

However, heroes tend to overstay their welcome. In the movie, “The Dark Knight Rises”, character Harvey Dent intones, “You either die the hero or you live long enough to see yourself become the villain.” The Cubicle Hero’s individual victory is celebrated initially, but situations change and the need for the hero diminishes over time. Or so we hope.

Cubicle heroes can become process bottlenecks and productivity killers. Why? The organization’s reward structure doesn’t lead them to being mentors. The cubicle hero has great value to the organization but their way of working can’t scale and the lack of information sharing prevents the organization from truly benefiting from their victory. The hero then gets involved in every project that touches their area and becomes the bottleneck as the demand for their time is greater than what is available. Thus, the hero slowly becomes the villain, delaying projects.

Many years ago, I worked at a company where a core process of the company was dependent on a very skilled hero. He was a great employee and did his job earnestly. However, he also guarded his knowledge so that he was the only one who understood it completely. This became a serious company concern when he was involved in an accident, leaving him unable to work for several months. Several key projects were impacted.

Changing the perspective, expectations and language of what happens as part of these efforts can lead to a different outcome. We need to make it clear that we want and need Corporate Explorers rather than Cubicle Heroes. Leif Erickson, the Viking, may have been the first to reach North America on a heroic journey, but it was the explorer, Columbus, that opened up North America to the world.

Explorers and Heroes share many common traits. They can see the big picture. They can dig down into the details when needed. They put in the extra effort to get the job done. The real difference is in the aftermath. Explorers open new trails so that others may come behind them. Explorers become guides to help others make the same journey. Heroes, on the other hand, continue to hold onto their conquest.

Changing your company culture to encourage Explorers over Heroes creates a scalable culture of knowledge sharing. This organizational approach leads to greater productivity, higher quality collaboration and timelier project progress.

To summarize, I recommend reviewing the following in your organization.

  • Provide a clear path to success for as many as possible to the rewards for exceptional effort, in a way that others and ultimately the organization can leverage
  • Provide public recognition for knowledge sharing
  • Structure rewards, within the process, so we can move from the mentality of one time hero-creation to our true goal of constant productivity improvement
  • Provide the Explorer with opportunities to help facilitate and implement their achievement within the organization. This keeps the Explorer engaged and looking for additional ways to improve
  • Provide collaborative tools like Office 365 and Yammer to help facilitate and support the Explorer’s journey

If you are ready to address more productivity issues in your organization, talk to us or join our Community.

Project Tasks are Your Lowest Priority

Project tasks are the lowest priority work you have on any given day. Wait, what?

It’s true! Strategically, we know project work is the most important future investment for the company. When you break down what you do every day, you’ll see that you are fitting in project work around the other work you have to do. It’s frustrating. You know you could be doing more. It’s frustrating because someone thought you had the time to get this work done.

If you don’t believe the premise, imagine the following scenario. You are staying late at the office to get some project work completed. Your manager’s manager sees that you are in the office, comes over, and asks you to do a task for tomorrow morning. If your answer is “I’m sorry, but I can’t because I really need to get this project work completed.”, their response will determine the relative priority of project work in your environment. For some, rejecting the task would be a career-limiting move.

Perhaps then, we are asking the wrong question when it comes to resource capacity management. Instead of asking whether this resource has free capacity to do the work, shouldn’t we be asking if the resource has enough consolidated free time to work on project work? If they do not, what can we do to remedy this situation?

In my “Done in 40” webinar, we discussed recent research by time-tracking software companies that identified how the top 10% of productive employees work in an agile fashion. These employees typically work 52 minutes and take a 17 minute break away from the work.  This is coherent with ultradian body rhythms studies from the 90’s and 00’s that showed your focus naturally waxes and wanes on a 1.5-2 hour schedule. These work sprints can make you very productive and help reduce mistakes and rework.

I’ve personally tried the sprint approach and I can say, it works well for me. I use a timer app on my Pebble watch to monitor my sprints. Fifty minutes is roughly the time where the mind starts wandering to “Did Joe ever respond to my email?” or “Is there coffee?”. Three sprints enable the top three daily tasks to get done easily.

The catch is you need to have 69 uninterrupted minutes to complete a personal sprint. This leads us back to the question of does a resource have consolidated availability? Yes, they have 3 hours available that day but if it’s in 15 minute increments, that’s not usable.

When a client with project throughput issues engages my services, I find it’s usually not a project management issue. Many times, the lack of consolidated availability is preventing the project work from happening. If you are interrupted every 10 minutes, as are most office workers in the United States, it’s very hard to get work done. If you are having issues getting projects through the pipe, perhaps it’s time to look beyond your projects and to your operational work processes.

We spend the majority of our energy providing oversight and processes to projects, which are a minority of the work instead of doing the same for operational work. McKinsey released a white paper recently that showed most of the operational spend goes to keeping the company running. New projects are a small portion of the overall effort. Yet, we don’t monitor operational work holistically the way we do projects. Perhaps, its time we start.

Project management processes are very helpful and needed. We’ve worked out how to anticipate and reduce risk and how to deliver the reward. We need to apply these approaches to how we manage all work. It’s the operational work that provides the overall context within which we do our project work. If improperly managed, it also constricts our ability to get our project work done. Operational work management improvements could yield the biggest benefit by enabling the consolidation of availability, yielding more usable time for project work.

If you are interested in finding out more about the specific techniques and how to use Microsoft Project to support this need, sign up here and get the recording link to the full “Done In 40” webinar.