Analyzing Office 365 Photo EXIF data using Power BI

In an earlier post, I wrote about how to mine the OCR data from Office 365 document folders using Power BI. During the research for that post I discovered that the photo EXIF data is also available for analysis from Office 365. I’ll show you how to get to it in this post.

What is EXIF data anyway?

Think of EXIF as special metadata for photos and with today’s cameras, records the settings of the camera as well as the location it was taken, if the camera has GPS. You can read more about the format here: https://photographylife.com/what-is-exif-data  Many smart phones today automatically include the EXIF data when you take photos with your phone. This data is automatically extracted by Office 365 from photos when uploaded to a document library.

Why do I care about this?

Imagine you work for a company that does inspections. Your inspectors use their phone to take photos of issues already. Wouldn’t it be great to show the photos by location and related location data together on a report page? This technique allows you to easily mine that EXIF data.

How do I get to it?

First, upload your images to a SharePoint document folder in Office 365 and the OCR process will initiate  automatically. I’ve had it take up to 15 minutes to process so you may need to be patient. You can do this via SharePoint Mobile app if you are uploading mobile photos.

Second, from Power BI desktop, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the correct column that contains the EXIF data. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

In this video, I’ll show you how to access the EXIF data and what you can do with the data.

Here’s the Power Query M Code

let
Source = SharePoint.Tables(“https://YourInstance/YourSite”, [ApiVersion = 15]),
#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″ = Source{[Id=”88888888-8888-8888-8888-888888888888″]}[Items],
#”Renamed Columns” = Table.RenameColumns(#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″,{{“ID”, “ID.1”}}),
#”Filtered Rows” = Table.SelectRows(#”Renamed Columns”, each [MediaServiceAutoTags] <> null and [Created] >= #datetime(2019, 4, 19, 0, 0, 0)),
#”Expanded FirstUniqueAncestorSecurableObject” = Table.ExpandRecordColumn(#”Filtered Rows”, “FirstUniqueAncestorSecurableObject”, {“Url”}, {“FirstUniqueAncestorSecurableObject.Url”}),
#”Removed Other Columns” = Table.SelectColumns(#”Expanded FirstUniqueAncestorSecurableObject”,{“FirstUniqueAncestorSecurableObject.Url”, “File”, “Properties”}),
#”Expanded File” = Table.ExpandRecordColumn(#”Removed Other Columns”, “File”, {“Name”, “ServerRelativeUrl”, “TimeCreated”, “TimeLastModified”}, {“File.Name”, “File.ServerRelativeUrl”, “File.TimeCreated”, “File.TimeLastModified”}),
#”Merged Columns” = Table.CombineColumns(#”Expanded File”,{“FirstUniqueAncestorSecurableObject.Url”, “File.ServerRelativeUrl”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”Picture URL”),
#”Expanded Properties” = Table.ExpandRecordColumn(#”Merged Columns”, “Properties”, {“vti_mediaservicemetadata”}, {“Properties.vti_mediaservicemetadata”}),
#”Parsed JSON” = Table.TransformColumns(#”Expanded Properties”,{{“Properties.vti_mediaservicemetadata”, Json.Document}}),
#”Expanded Properties.vti_mediaservicemetadata” = Table.ExpandRecordColumn(#”Parsed JSON”, “Properties.vti_mediaservicemetadata”, {“address”, “location”}, {“Properties.vti_mediaservicemetadata.address”, “Properties.vti_mediaservicemetadata.location”}),
#”Expanded Properties.vti_mediaservicemetadata.location” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata”, “Properties.vti_mediaservicemetadata.location”, {“altitude”, “latitude”, “longitude”}, {“Properties.vti_mediaservicemetadata.location.altitude”, “Properties.vti_mediaservicemetadata.location.latitude”, “Properties.vti_mediaservicemetadata.location.longitude”}),
#”Expanded Properties.vti_mediaservicemetadata.address” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata.location”, “Properties.vti_mediaservicemetadata.address”, {“City”, “State”, “Country”, “EnglishAddress”}, {“Properties.vti_mediaservicemetadata.address.City”, “Properties.vti_mediaservicemetadata.address.State”, “Properties.vti_mediaservicemetadata.address.Country”, “Properties.vti_mediaservicemetadata.address.EnglishAddress”}),
#”Changed Type” = Table.TransformColumnTypes(#”Expanded Properties.vti_mediaservicemetadata.address”,{{“File.Name”, type text}, {“Picture URL”, type text}, {“File.TimeCreated”, type datetime}, {“File.TimeLastModified”, type datetime}, {“Properties.vti_mediaservicemetadata.address.City”, type text}, {“Properties.vti_mediaservicemetadata.address.State”, type text}, {“Properties.vti_mediaservicemetadata.address.Country”, type text}, {“Properties.vti_mediaservicemetadata.address.EnglishAddress”, type text}, {“Properties.vti_mediaservicemetadata.location.altitude”, type number}, {“Properties.vti_mediaservicemetadata.location.latitude”, type number}, {“Properties.vti_mediaservicemetadata.location.longitude”, type number}})
in
#”Changed Type”

Useless Solutions: Windows 3.1 Hot Dog Stand Theme – Power BI style

On occasion, useless solutions like this will be posted as they may not be directly useful in production, but they are educational in value and can lead to useful work.

Oh goodness, what have you done?

This post originally started as a joke at dinner with Adam Saxton of Guy in a Cube and Jason Himmelstein of the BiFocal podcast. It was worth it to see the look on their face when I mentioned I was bringing the Hot Dog Stand Theme to Power BI. And now I have.

What is “Hot Dog Stand?”

For those who have no idea what this is about, Windows 3.1 shipped with a theme called Hot Dog Stand. Coding Horror has a nice write up with screen shots here. It was hideous and no one knew why it was there. It became a joke to change someone’s theme to Hot Dog Stand if they left their workstation unlocked. Hot Dog Stand never made it into Windows 95 and faded, for the most part into history.

An opportunity arose

As luck and work would have it, clients were asking for custom themes and a deep dive into the Power BI themes format was necessary. Hence, the idea to combine wacky fun with learning the ins and outs of theme JSON descriptors by recreating Hot Dog Stand.

Getting Started

I started where likely most people start, the Microsoft docs on Power BI themes here. It’s a helpful document but wow, there’s a lot of potential coding to do here. I needed to get rolling more quickly.

Cool, a theme generator

Next stop was the PowerBI.Tips theme generator. This tool is fairly easy to use to generate a quick theme file. it creates the JSON but has limited options on what can and can’t be changed. The results were ok, but I wasn’t feeling the serious ugly of Hot Dog Stand yet.

Even cooler, a GitHub repository!

After some web searches, I can across this GitHub repository of Power BI theme snippets. David Eldersveld put this repository together, probably due to the same reasons that brought me here. I needed to do more customization but I didn’t want or was able to hand code all of the particulars.

The free VS Code made this pretty easy to assemble. You will likely want to use a JSON formatter as you are doing this. Code may have one but in the interest of moving fast, I found this site that did the job well.


One tip is that if you are going to merge many of the snippets into a theme file, ignore the first three and last two lines of every snippet. Otherwise, you’ll get errors importing it.

The Result

To reiterate, Hot Dog Stand started as a theme generator generated file that I edited in VS Code and augmented with snippets from GitHub. The result is this.

Isn’t it hideous?

If you would like a copy of the theme file, to do your own terrible things with, download it here. If you have questions or comments, please post them below. Check out our Twitter feed at @tumbleroad.

The easiest way to do small-multiples in Power BI!

Recently, there was a post on the Power BI blog about how to do How to do multivariate reporting with Power BI or what you may know as “Small-Multiples.” Small-multiples allow you tell a big story via a chorus of little stories, that would otherwise be obscured by the large number of plotted points. There are many uses of this technique in the data visualization literature. If you are unfamiliar with small-multiples, please read this article as a primer.

In October 2016, Microsoft Research released the Infographics Designer custom visual for Power BI. Much of the attention was focused on how you can use graphics to create custom charts. However, buried in all of this was the ability to create small-multiples easily, using the Infographics Designer.

The Infographics Designer allows for four types of small-multiple visualizations.

  • Cards
  • Columns
  • Bars
  • Lines

In the video below, I’ll demonstrate to you how to use the Infographics Designer to create multiple types of small multiples.

If you have comments or questions, please post them below!

Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount. https://sharepointna.com

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action. https://techcommunity.microsoft.com/t5/Microsoft-Search-Blog/Find-what-you-want-discover-what-you-need-with-personalized/ba-p/109590

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:

Term

Meaning

Term

Meaning

Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post (https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi/) and this post (https://blog.crossjoin.co.uk/2017/12/14/removing-punctuation-from-text-with-the-text-select-m-function-in-power-bi-power-query-excel-gettransform/ ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

In Data We Trust? Part Two

Part 2 of this series explores the difficulty in determining whether your business intelligence content is using authentic data. To illustrate the point, let’s examine a recent Seattle Times article about the Measles outbreak happening in Washington.

An Example

The article in question, “Are measles a risk at your kid’s school? Explore vaccination-exemption data with our new tool,” presents a story filled with data charts and tables and made some conclusions about the situation. Many internal reports and dashboards do the same, presenting data and conclusions. Unlike internal reports, newspapers list the source and assumptions in small print at the end of the story. Knowing the data comes from an official source adds authenticity.

The following note is supposed to increase authenticity.

“Note: Schools with fewer than 10 students were excluded. Schools that hadn’t reported their vaccination data to the Department of Health were also excluded.

Source: Washington Department of Health (2017-18)”

But does it really? Documenting any exclusions and note sources is a good practice. However, it’s not very prominent and if you search for this data, you’ll likely find several links. There’s no link or contact information.

Data authenticity is crucial to making successful decisions. In order to do so, key data questions should be answered.

What data was used?

Many content creators don’t bother to document the source of their information. Many would not have the same level of confidence about the new financial dashboard if the viewer knew the data came from a manually manipulated spreadsheet, instead of directly from the finance system. How would the reader know anyway? In many cases, they wouldn’t. The Seattle Times provided a hint, but more is needed.

When you buy items like wine, you know what you are buying because the label spells it out. A wine bottle is required to have a label with standard data elements to ensure we know what we are buying. For example, a US wine label must have the type of grape used to make the wine.  Even red blends must list the varietal and percentage so that the customer is clear on what is in the bottle. Having the equivalent type of labeling would improve transparency about data authenticity.

Who owns the data we are consuming?

This is very important, especially if we spot incorrect or missing data. Who do we contact? The Seattle Times lists the Washington Department of Health as the data owner. This is a good starting point but doesn’t completely fill the need. For internal reports, all data sources should include an owning team name and a contact email. The data vintage example below also includes the site urls and a contact email.

Data Vintage Example

How old is the data?

It’s one thing to know when’s the last time the data was pulled from the source but that’s not the need. Data age can strongly influence whether it can be used to make a decision. In our Marquee™ products, we include a data freshness indicator that shows proportionally how much of the data has been updated recently. Recently becomes a business rule of what constitutes fresh data. With some companies, the entity must have been updated with in the last seven days to be considered fresh.

Data Freshness indicator for time dependent data.

How to address?

We took the liberty of creating a Power BI model that analyzed the same immunization data used in the Seattle Times story. We’ll use this model to illustrate the simple technique. The following steps were performed to enable a simple “data vintage” page.

Procedure

  • Create a Data Vintage page (you may need more than one, depending on how many datasets and sources you have)
  • Add a back button to the page. We put ours in the upper left corner
  • Add the following information to the page using a consistent format that you’ve decided upon
    • Name of dataset
    • From where is the data sourced and how often
    • Which team owns the data
    • How to contact the data owner, if available
  • Create a Data Vintage bookmark for the data vintage page so that it can be navigated to via a button.
  • Go back to the report page that you created from this data
  • Add an Information button to the upper right corner of the page.
  • Select the button and navigate to the Visualization blade
  • Turn on Action
  • Set Type to Bookmark
  • Set the Bookmark to the one you created in Step 4.
  • Ctrl + Click the Information button to test
  • Ctrl + Click the Back button to test

That’s it. Anytime a user or fellow Power BI Author has a question about the underlying model data, it can be accessed very easily. You’ll also improve impressions of data authenticity by implementing this label in a consistent manner across all content.

A Working Example

We’ve created a different analysis of the Washington State Immunization exemption data, where we also added a data vintage page. You can try it out below. Click the i Information button in the upper right of the screen to display the data vintage.

In Part 3, we’ll examine the problem of data integrity and how can you be sure your data has implemented the proper business rules for your organization.

Have a question or comment? Feel free to post a comment below.

New SharePoint Modern Power BI Theme

If you are creating Power BI content that you are embedding into a SharePoint Modern page, you know that the default Power BI theme colors don’t match SharePoint’s colors. Not to worry, we’ve got you covered. 

Attached to this post and also cross-posted to the https://Community.PowerBI.com Theme Gallery is the Tumble Road SharePoint Modern theme. This theme uses the core Modern experience colors in SharePoint, assuring your Power BI content will look great when embedded within SharePoint.

Download the zip file here.

TIP: How to make your Power Query M code self-documenting

What were you thinking?

Do you remember exactly what you were thinking three months ago when you wrote that rather involved Power Query M statement? Me neither. So, why suffer when you can easily avoid the pain of code archaeology in Power BI? Piecing together a forgotten thought process around a particular problem can be a time consuming and frustrating task.

We’ve all seen this

You’ve seen Power Query M code like below. Chances are, you’ve created M code like below as well. You will be looking at this in a few months, thinking to yourself, removed what columns?!? and then I change the type. Awesome, what type and why did I do this?

Leverage these functions to hold your thoughts

I’ll take you through two features that you can use to document your thought process as part of your Power Query M code in Power BI. Your future self will love you for taking a few minutes to leverage these features to document your thought process. It also makes maintenance a breeze as it is rather easy to figure out where your data is retrieved.

Discipline is key…

Please note, this can feel tedious to do. It is quite easy to create many transformations in a short amount of time. If you take a few minutes to do this when  it is fresh in your mind, you’ll be better positioned if there are issues later or if you need to add functionality. The default names of the steps, like Renamed Columns 3 are not very helpful later.

How do I document my steps?

There are two techniques you can use to make your M code serve as documentation.

First, you can right click on a transformation step and select Rename to be whatever you wanted to document. For example, Renamed Columns above could be renamed to Renamed custUserID to User ID. This makes it very clear what was done.

Second, if you need more room to fully document a transformation step, right click on a transformation step and select Properties. The description field can be used for long form documentation.

Finished Product

We started with this list of transformations on a SharePoint list.

After taking the time to rename each step with what and why, it now looks like this.

I hope you find this post useful! If you have any questions, post them below in the comments.

UPDATE: This technique also works in Excel’s Power Query Editor as well.

How to sort your dataset columns alphabetically in Power BI

Power BI Accidental Tip

This tip came about by accident. I was working on a new class exercise for https://sharepointna.com when I came across an easy way to sort your dataset columns alphabetically.

While column search in the Query Editor makes this less necessary, it is very useful when working with DAX where you are in the same data grid and scrolling left and right. Yes, you could use the field search or the field picker, but I prefer for my dataset to be in a predictable order.

Caveat

This tip works right now with the March 2018 release of Power BI Desktop. There’s no guarantee it’ll work with future releases of Power BI Desktop.

OK, so how do I sort my dataset columns alphabetically?

  • From Power BI Desktop, click Edit Queries
  • On the Home tab in the Query Editor, click Choose Columns
  • Click the AZ button in the upper right and select Name
  • Click OK
  • Click Close and Apply.

Yes, that’s it. Now your dataset columns are sorted in alphabetical order. Normally I do this as my last transformation, after all others have been specified.

Creating Power BI Report Backgrounds the easy way!

Are you still putting boxes around your Power BI graphics?

Stop boxing your data! Boxes add unnecessary visual noise and can make it hard to see your graphics. We present a better way to add style to your data story in Power BI.

These techniques enable you add a graphical background to your Power BI reports, using the power of Microsoft PowerPoint!  Nothing to buy or install. Graphical backgrounds provide flexibility in styling while reducing effort to achieve a great looking Power BI report.

I’ll take you through two separate ways you can implement these backgrounds and point out pros and cons to each approach. I’ll also show you what to consider when you have to support both web and mobile reports.

I hope you enjoy! Please leave your comments below.

If you are in Washington, DC at the end of March, 2018, join us for our Power BI workshop at SharePoint Fest DC!

Integrating CA PPM Data Warehouse security with Power BI RLS

Background

Tumble Road has partnered with CA Technologies over the last year, to make their reporting more Power BI friendly. At CA World 17, we helped introduce the new OData connector to the CA PPM Data Warehouse to attendees.

The CA Need

The CA PPM Data Warehouse (DWH) has very rich security support at the user level and at the role level. However, when the OData connector was originally in beta, a need to replicate this security functionality within Power BI was noted. If we were not able to have Power BI consume the CA PPM DWH security model, this would make Power BI a non-starter for many companies.

The Power BI Solution

Row Level Security within Power BI was the answer to carrying forward the DWH security model. Power BI enables filtering of data based on the current Azure Active Directory (AAD) User ID. Since all Power BI users have to have an AAD User ID, part of the solution was at hand.

The CA PPM DWH has a security table that lists every entity for which a user can access. This information is used by Power BI to security the data.

However, CA’s solution does not use AAD so a mapping exercise may be needed between the CA User Name and the Microsoft User ID. The video below shows a very simple case where we are able to append the Office 365 domain name to the CA User Name to get the requisite AAD User ID.

Your particular situation may require a more extensive mapping exercise. Elaborate mappings can be done in Excel and imported into the model. This imported mapping data can be used with the Security table to implement the requisite information.

The Presentation

In the video below, I show you how to bring the Security table from the Data Warehouse into Power BI and use the data as a basis for applying Row Level Security within Power BI. This should restrict the data to only that that the person is authorized.