Converting RTF to Text in SQL Part 3

captain-picard-facepalm-meme

Yes, Part 3. This technique was published some time ago and as it gets further “field testing”, new conditions have come to light which require a revision of this technique. Assuming this is the last post on this topic, I need to figure out how to duplicate this logic in PowerBI.

I consider this to be version 15 or so of this query since my first version was written in mid-September of 2010. The changes here are as follows.

First, the DTD definitions have been expanded to include all possible HTML codes. This should prevent a report blowing up when someone decides to use an unexpected symbol somewhere.

Second, I keep seeing random <br> tags in the fields, which ironically is breaking the query. I’ve added a replace statement for that as well.

Lastly, I’ve expanded the field sizes to Max where necessary to accommodate truncation issues.

Sample code is below. I strongly recommend copying into NotePad first so that you don’t accidentally get curly quotes. I’ve also left it as straight text to make it easier to copy.

declare @Headxml nvarchar(3000)
declare @Footxml nvarchar(50)

set @Headxml = N'<?xml version=”1.0″?>
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
http://www.w3.org/TR/html4/loose.dtd”
[<!ENTITY quot “&#34;”>
<!ENTITY amp “&#38;”>
<!ENTITY lt “&#60;”>
<!ENTITY gt “&#62;”>
<!ENTITY nbsp “&#160;”>
<!ENTITY iexcl “&#161;”>
<!ENTITY cent “&#162;”>
<!ENTITY pound “&#163;”>
<!ENTITY curren “&#164;”>
<!ENTITY yen “&#165;”>
<!ENTITY brvbar “&#166;”>
<!ENTITY sect “&#167;”>
<!ENTITY uml “&#168;”>
<!ENTITY copy “&#169;”>
<!ENTITY ordf “&#170;”>
<!ENTITY laquo “&#171;”>
<!ENTITY not “&#172;”>
<!ENTITY shy “&#173;”>
<!ENTITY reg “&#174;”>
]><html><body>’

set @Footxml = N'</body></html>’

select   *
,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”)   AS [YourMulti-lineCustomFieldNewName]

,ISNULL(LTRIM((CONVERT(xml,(@Headxml + replace([YourMulti-lineCustomField2],'<br>’,”) + @Footxml),3).value(N'(/)’,’nvarchar(Max)’))),”)   AS [YourMulti-lineCustomFieldNewName2]

FROM  dbo.MSP_EpmProject_UserView

Is Your Data Looking for a Problem to Solve?

Data can be a wonderful thing. There are so many stories you can tell with the right data. These stories have the power to persuade and motivate positive changes in the organization. And so the marketing stories go on and on about the power of data.

The question is, do you have the right data? Can you tell compelling stories with your data? How do you know what is possible or needed? Let’s discuss some straightforward techniques to help you determine the answer.

Many times, I’ve walked into a client’s office and was greeted with large quantities of data. However, the client was still struggling to tell a compelling story from their data store. There was always doubt as to whether the reports they generated were useful or compelling.

I’ve also encountered clients who maintained additional data as they thought someone *might* need. Unfortunately, wishful thinking is not an effective business strategy and can waste scarce company resources.

Here are some signs that you might need to rethink your data strategy.

  • Do your reports require data definitions, report keys and long explanations for someone to understand what they are viewing?
  • Do you have a training session on “how to use the reports”?
  • Are you maintaining data for which you are unsure of the purpose or use of said data?

If you answered yes to any of these, it might be time for a little housecleaning.

The Effective SimplicityTM approach dictates that we maintain as little data as possible to meet the business need. Minimizing data is a way of reducing the overall cost by reducing the overhead to manage the system. The trick is to determine what data is needed which is sometimes easier said than done.

At Tumble Road, we use an approach that follows a pattern that is understandable and helps determine the context and the need for the right data.  The pattern is Conversation – Question – Supporting Data and Activities.

Conversation is about identifying the specific meeting / use case that the tool will ultimately support. A diagram of some standard Project Management conversations are illustrated below. The conversation will have a schedule so that you can determine how often the data needs to be updated. The conversation has standard participants, which is helpful if you need feedback on the data you will need to provide. It also helps later in the case where you need to inform the report consumers of an upcoming change. Lastly, a list of 1-3 key questions is defined for the use case.

Project Communications

Key Questions are the information needs that needs to be supported. The Key Question also determines the form in which the answer must be presented.

For example, if the conversation is the weekly Tuesday portfolio status meeting between IT Management and Finance, likely, you will need to answer questions similar to:

  • What have we spent?
  • What did we plan to spend so far?
  • What are we planning to spend in the near future?

Supporting Data and Activities are the exact data elements that allow the key question to be answered and the activities necessary to generate, collect and maintain said data. The data can help you determine other incidental data, specifically necessary for organizing and filtering the Supporting Data. The activities can help you spot process gaps that may be present that would prevent you from successfully addressing the question.

When examining the What have we spent question? above, Finance wants Project spend broken down by Project, Cost Type (Capital or Expense) and Sum the totals by Fiscal Year, Fiscal Period for each IT Director.

From this short exercise, I already know the following is needed:

    • A lookup table for Cost Type with values of Capital, Expense.
    • A task custom field with Assignment Roll Down of the values enabled.
    • A lookup table for IT Directors, to maintain consistency.
    • A Project custom field for IT Director so that this can be assigned to each project.
    • To add this Project custom field to the correct Project Detail Page so that the PM can maintain this data.
    • Resource rates in the system so that costs can be automatically calculated.
    • To enter the Fiscal Periods in Project Online.
    • A project template which exposes the Cost Type column at the task level.
    • The cross-tab report layout
Director/Project/Cost Type FY2014-M01 FY2014-M02 FY2014-M03 FY2014-M04 FY2014-M05
John Smith

$100

$100

$100

$100

$100

Project X

$100

$100

$100

$100

$100

Capital

$50

$50

$50

$50

$50

Expense

$50

$50

$50

$50

$50

 

As you can see, you can generate quite a bit of actionable detail using this approach. There are several follow-up questions that can now be asked since you have specific, concrete examples from which to work. For example:

  • Do you need filtering by director?
  • Do you show only incomplete projects?
  • Is the IT Director data something that should be maintained by PMs?

This approach also aids in designing supporting processes. You already know this is a weekly review, so cost information has to be updated on a weekly basis by the PM. As the meeting happens on Tuesday, Monday is the deadline to get updates in and published.

A final advantage is that it is easy to track the progress of visible work by your users. For these engagements, the top level summary task would represent the conversation, a key question represents a sub-tasks and key activities, such as gathering required data, defining maintenance process, etc. are the third level tasks. This makes it easy to determine the “health” of the conversation implementation.

Once you are maintaining the data essential to answering the questions, with clearly defined uses, you should see a reduction in overhead as well as evidence of easier training conversations. If you would like to learn more, please join our Community here.

Querying Multi-Value Custom Fields

thumbsup

Scenario

You have a report where the need is to show multiple values for a given custom field. For example, you have a multi-value Project custom field for Impacted Business Organizations.

You want to see your values as a comma delimited list so that this can be used in an Excel pivot table or SSRS tablix report. You might need something like:

Project Impacted Business Orgs
Project XYZ IT, HR, Operations

 

The Background

When a Project text custom field with associated lookup table is made multi-value, a number of changes are made in the Reporting Database. First, the field is removed from the MSP_EpmProject_UserView as that view only supports single select Project text custom fields with associated lookup table. Second, a new Association View view is created which has the following naming convention: MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView

MSP prefixes all Microsoft views, CF for Custom Field and PRJ for the Project entity. This association view contains a record for each of the multiple custom field values selected, linking the Project record to the lookup table values in the MSPLT_YourMultiValueCustomFieldLookUpTableName_UserView view. LT in this case, stands for lookup table, so there is a MSPLT view for each lookup table in the Reporting Database.

This mechanism was first documented in the Project Server 2007 Report Pack that I wrote and can be found here: http://msdn.microsoft.com/en-us/library/office/bb428828(v=office.12).aspx  The Portfolio report also provides another way to utilize the multi-value field.

The Query

This query uses the XML functionality to build the concatenated string, based on a technique documented on StackOverflow here.

Once I modified the STUFF statement for specific use for Project Server, I wrapped it with an outer SELECT to combine it with all of the data from MSP_EpmProject_UserView. Note, if you have multiple multi-value fields, you will have to duplicate this inner piece in the parentheses for each field. The places to replace with your own field names are highlighted.

SELECT MSP_EpmProject_UserView.*
             , MVList.[YourMultiValueCustomFieldNameValues]
FROM MSP_EpmProject_UserView
INNER JOIN
   (SELECT   MSP_EpmProject_UserView.ProjectUID 
            ,ISNULL(STUFF((SELECT ', '+ MemberValue 
    FROM [MSPLT_VP Lookup_UserView] 
    INNER JOIN [MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView] 
    ON [MSPLT_YourMultiValueCustomFieldLookUpTableName_UserView].LookupMemberUID = 
    [MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView].LookupMemberUID
    WHERE [MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView].EntityUID = 
    MSP_EpmProject_UserView.ProjectUID 
    FOR XML PATH(''), TYPE
    ).value('.','varchar(max)')
    ,1,2, ''),'')AS YourMultiValueCustomFieldNameValues
FROM    MSP_EpmProject_UserView 
GROUP BY ProjectUID ) MVList
ON MSP_EpmProject_UserView.ProjectUID = MVList.ProjectUID

The Output

The output will yield a comma delimited list of values in the last column of the dataset. If you need that comma delimited list sorted, add an ORDER BY MemberValue statement right before the GROUP BY ProjectUID) MVList statement.

Database Diagrams–Project Server Reporting Database

image

These high level entity relationship diagrams were first published in the deck for my Project Conference Hands On Lab deck. I’ve had a number of requests for this information so here it is.

These diagrams are based on the 2010 RDB but the 2013 RDB should not be materially different. For 2007 users, many of these same entities also exist in the 2007 RDB as well. The UID fields are the keys used to join these entities together.

The name of the entity also contains the recommended table or view name to be used in the Reporting database. Any entity ending with _UserView will automatically include custom fields for that entity. Multi-value custom fields are not included and require special querying to retrieve (more on that later).

If you are tying different data entities together, you should also consider using the lowest level of detail for a given data element. For example, if you are querying Project – Task – Assignments – Resources, I would use AssignmentWork rather than TaskWork or ProjectWork as AssignmentWork will aggregate correctly in PivotTables. Otherwise, you will get a multiple of ProjectWork or TaskWork, depending on the number of records retrieved.

Click on the graphic below to make it bigger.

Project Server Reporting Database Entity Relationship Diagram

PSERD1

Project Server Reporting Database Timesheet Entity Relationship Diagram

PSERD2

Happy querying!

Retrieve Fiscal Year Dates Dynamically

Finance guy

At some point, you will be asked for information by Fiscal Year. The Fiscal calendar was set up in Project Server so it should be accessible for reporting, right? A fair number of companies have fiscal calendars which don’t exactly line up with the standard calendar which can cause some challenges. You could hard code the beginning and ending dates of the fiscal year, but then that would create an annual report maintenance task. What’s a report writer to do?

If you’ve set up the Fiscal calendar within Project Server, it is easy to retrieve this data for the current Fiscal Year based on the current date. The Fiscal calendar information is stored in the MSP_TimeByDay table and can be queried to do all sorts of date based operations.

The example provided below can be used with Excel based reports as well as with SSRS reports.  The key is to do the lookup in two steps. The first part of the query retrieves the Fiscal Year beginning and ending date so that you can use this information in the WHERE clause of the following query to filter the dataset.

THE EXPLANATION

Let’s say I need to find out in real time, how many hours of work were scheduled by month for the Fiscal Year.  The real-time requirement pushes me to use SQL rather than OLAP for this data. Since monthly data on assigned work and cost requires aggregation of the time phased assignment data,  I need to query the Assignment By Day view (MSP_EpmAssignmentByDay_UserView) in the Reporting database. Production reports would require additional joins to Resource and Project views to complete the dataset but for illustration purposes, I’ll limit the example to only the Assignment By Day view for illustration simplicity.

THE QUERY

The following query works in Excel where I find this data is most commonly requested.

The DECLARE statement declares the two variables where we will store the date results for the beginning and end of the current Fiscal Year.

The next SELECT statement uses a subquery to get the beginning and ending date of the current Fiscal Year. The subquery uses the current date to find the current Fiscal Year for use in the WHERE clause of the primary query.

You may be wondering what this clause is doing as you look at the code below. CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME))  This clause is needed to address the time mismatch between the TimeByDay data and the current time.

All of the dates in the TimeByDay field in the MSP_TimeByDay table are stored as dates with midnight time. If you retrieve the current date and time, the numbers won’t match if you attempt to use it as a filter as the time portion will be different. So, you have to reset the time part of the current time back to midnight for this filtering to work.

  • GETDATE gets the current date and time.
  • Casting the result as a FLOAT represents the date and time as a decimal number, where the portion to the left of the decimal is the date and the time is stored to the right of the decimal.
  • The FLOOR statement sets the decimal portion of the number to its lowest value, which would represent midnight.
  • The outer CAST makes it a Datetime field data type again so that we can use it to filter on a Datetime field. In the last query, you use the declared variables to find all records between the beginning and finishing date of the Fiscal Year. Note, this technique can also be used to get the current Fiscal Quarter, current Timesheet period, calendar month, etc. There is a fair number of data elements in the MSP_TimeByDay table that you can use for date related needs.

DECLARE @BEGINDT DATETIME, @FINISHDT DATETIME SELECT @BEGINDT = MIN(TimeByDay) , @FINISHDT = MAX(TimeByDay) FROM MSP_TimeByDay WHERE FiscalYear = (SELECT FiscalYear FROM MSP_TimeByDay WHERE TimeByDay = CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)) SELECT * FROM MSP_EpmAssignmentByDay_UserView WHERE MSP_EpmAssignmentByDay_ UserView.TimeByDay BETWEEN @BEGINDT AND @FINISHDT

WHERE DO I PUT THIS?

If you are using one of the default Excel report templates in the Project Server Business Intelligence Center, you would open the file in Excel client.

  • Go to the Data tab
  • Click Connections
  • Select the connection and click the Properties button
  • Go to the second tab and click into the SQL box
  • Select all (Ctrl+A) and delete the existing SQL
  • Copy the above query and paste into this box
  • Click OK to update and OK on any warning
  • Once updated, click Close to close the dialog

Once it returns the data, you would see the returned fields in the Excel field well next to the Pivot Table.


Looking for more Project Server information?

Visit the AboutMSProject.com Recommended Books page.


Converting RTF to Text in SQL Revisited

If you’ve ever attempted to include content from a Project multi-line text custom field or Issues multi-line fields in an Excel report, you will get a result that looks like this:

<div><p>Vendor is having issues with deliveries to the East Coast.</p></div>

At the 2012 Project Conference, I presented a SQL based solution that converts the RTF content to XML and then returns the text. This technique was also published here in the Project Server Forum.

In practice, the technique worked but had failures in some cases. I found early bugs in SharePoint where the lead or trailing <div> tag was missing. However, this did not explain all failures.

I got an email from Doug Welsby of Microsoft Canada, that the failures were due to text values like &nbsp; which are not standard XML. Basically, an inline (Document Type Definition) DTD is needed to convert a non-standard XML value like &nbsp; to XML compliant &#160;  The full list of possible values to convert can be found here: HTML Codes

Technique Assumptions

This technique works on any multi-line Project custom fields or multi-line list columns in SharePoint. Therefore, the Issue Discussion field from SharePoint can also be cleaned with this technique.

The following example illustrates the SQL technique, though you may have to extend the DTD definitions to fit your own data. Use the table to above to add additional ENTITY declarations. The technique also assumes that the multi-line field contains valid HTML. I’ve found no issues with this technique on systems patched to June 2012 CU or later on Project Server 2010.

This technique will work with Project 2007, 2010 and 2013 databases as there are no real differences in how this type of data is stored across releases. This technique does work inside an Office Data Connection.

NOTE: One of the challenges I faced is that I could find no good examples of how to do an inline DTD in SQL. While the technique below works, I’m open to a more elegant way to construct the DTD. Please post any ideas or issues in the comments.

The Technique

The technique does the following:

  • Converts the RTF value to an XML document
  • Extracts the text value from the XML
  • Trims the blank space from the resulting value
  • Replaces any null values with a blank
    First, two variables are declared to hold the XML Header and Footer information. The Header contains the inline DTD declaration for &nbsp; and could be augmented with more declarations. The Footer contains the tags needed to complete the document.

The SQL CONVERT function changes the concatenated string of Header, your multi-line Project custom field and Footer into an XML document. The 3 option in the CONVERT command enables the inline DTD as well as it leaves all internal spaces as is. More details on the CONVERT command can be found here.

The .value method performs and XQuery and returns the string value to SQL. More details on this method can be found here. The method returns the value to an nvarchar(4000) field but you can use nvarchar(max) if you think you may have a truncation issue.

LTRIM removes the leading spaces of the resulting string. You could also do a RTRIM to remove following spaces but I didn’t find this to be an issue. The ISNULL is used to return a blank when a null value is returned. If you don’t do this, you get (blank) in Excel rather than an empty cell. You can fix this in Excel but I found it more efficient to simply fix it in the data source for every report that uses it.

Code Sample

Replace the YourMulti-lineCustomField text below with the name of your custom field. Since the original field is also in the query result set, I tend to name it YourMulti-lineCustomFieldClean so that I can distinguish the two versions.

The ISNULL clause should be repeated for each custom field you are cleaning.

    declare @Headxml nvarchar(350)
    declare @Footxml nvarchar(50)
    
    set @Headxml = N'<?xml version="1.0"?>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
    "http://www.w3.org/TR/html4/loose.dtd"
    [<!ENTITY  nbsp "&#xA0;"><!ENTITY quot "&#x22;">]><html><body>'
    
    set @Footxml = N'</body></html>'
    
    select   *      
            ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField]+@Footxml),3)
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName]
    
            ,ISNULL(LTRIM((CONVERT(xml,(@Headxml+[YourMulti-lineCustomField2]+@Footxml),3)
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName2]
    FROM  dbo.MSP_EpmProject_UserView

What’s the RDB? Part 1

If you are planning to write reports over your Project Server data, you will need to know about the Reporting Database or RDB for short.  

A Little History

In Project Server 2003, all of the data was typically in one database.  Since this database was optimized to support Project Server transactions, reporting on that data could be a bit challenging.  Also, there were potential performance concerns as a long running report could impact the performance of core operations.

In Project Server 2007 and continuing in Project Server 2010, a separate Reporting Database is maintained.  This separation provides a database schema that is friendlier to report writers and it allows administrators to move the RDB to a separate database server if performance becomes a concern.

Project Server 2013 recombined the four databases into one physical database using four different schemas. This change makes Project Server 2013 easier to manage in a cloud infrastructure. If you upgrade from 2010 to 2013 and use the name of the reporting database as your database name, all of your reports will still work with no change.

In part 1, an overview will be presented and an overview of one key relationship within the RDB will be presented.

Ok, So What’s In It?

The Reporting Database is used as a staging area for OLAP Database creation and to provide project server data in an easier format to query.  It contains data on the following major entities:

  • Project Decision
  • Project
  • Task
  • Task by Day
  • Assignment
  • Assignment by Day
  • Resource
  • Resource by Day
  • Timesheet and Administrative Time
  • Issues
  • Risks
  • Deliverables
  • Workflow
  • Other Supporting entities

It’s a bonanza of data for you to use.  Here’s a simplified relationship diagram of the Reporting database core tables.  I’ll use this as the basis for posts going forward and show you how to get some nice reports out of this data.

simplified ERD.

For your convenience, I’m hosting an online interactive copy of the RDB Field Picker.that comes in the Project Server 2010 SDK.

The P-T-A-R Principle

One of the key reporting relationships in the Reporting Database is that between Project – Task – Assignment – Resource (PTAR).  This relationship constitutes the majority of Project Management centric reporting. 

Project

The Project entity contains the core information related to the Project plan.  There is one record for each project on Project Server.  This record includes:

  • Any Project level custom fields
  • Project rollup of work and costs
  • Project start and end dates
  • Project baselines

A complete Excel based field reference can be found in the Project Software Development Kit, found here: Download the Project 2010 SDK

The primary view for Project information in the RDB is MSP_EPMProject_UserView where ProjectUID is the primary identifier.  In the diagram above, all of the other tables that are linked to Project are related to that table by the ProjectUID value.  If you are writing queries which require Project information, your join should use the ProjectUID.

Task

The Task entity contains task level information for each task in every project.  The data is related to the task itself and not to a specific resource.  This record includes:

  • Start and End Dates
  • Baselines
  • Task Custom Fields
  • Work
  • Costs
  • Characteristics of the Task

Task characteristics are especially of interest when reporting over the data.  These tend to follow the naming convention of TaskIs… For example, if I wanted a Milestone only report for a given project, I would select all task records where projectuid is equal to the Project’s UID and TaskIsMilestone is equal to 1.  There are a number of these fields and I would suggest looking at the RDB Field Reference for more details.

The primary task view is MSP_EpmTask_UserView where TaskUID is the primary key.  Though TaskUID is a Globally Unique Identifier (GUID), when working with other related tables/views, I still do my joins using both ProjectUID and TaskUID.  The chances of duplicate GUID values are low but I don’t want to be that type of lucky with my project reporting.

Assignment

The Assignment entity contains information specific for a given resource on a given task.  Therefore, there will be a record for each resource for each task.  The primary view is MSP_EpmAssignment_UserView where AssignmentUID is the primary identifier. 

The Assignment record contains primarily:

  • Dates
  • Costs
  • Work
  • Baselines
  • Custom Field values where rolldown was enabled.

As a consequence, the Assignments table is one of the largest in the RDB and in Project Server as a whole.  As you need this data to derive the relationship between resources and the tasks to which they are assigned, play specific attention to query execution plans when creating new reports.

Resource

The Resource entity contains information related to a specific resource, be it a work, material, cost or budget resource.   There is one record for each resource created in Project Server.  The resource record contains:

  • Resource Custom Fields
  • Resource Characteristics
  • Login Account information
  • Organization information like Team, RBS
  • Cost rates
  • Timesheet Manager

Like the Tasks, the Resource view follows a similar naming convention of ResourceIs… to denote Resource characteristics.  For example, if I only want to see active Resources, I would filter my records based on ResourceIsActive = 1.  Especially for resources, you should add filtering on these fields to ensure you do not include team, generic and inactive resources in your reporting.

That covers the core aspects of these entities.  In future posts, I’ll go into greater details as to how best to query against them as well as cover other relationships within the RDB.

I need information, now what do I do?

If you’ve ever been asked for information to support work you are doing or perhaps, to keep people informed of your efforts, you may have struggled with the process to define and deliver the information needed.  If you identify with this issue, this post will cover three key questions you should answer to help set your requirements so that you can get to your information goal quickly.

What’s the Problem and Why Do I Care?

Our starting point is based on a quote from one of my favorite professors.  “What is the problem and why do I care?”  His point was that you have to be clear on the problem and what you are trying to address if you are going to be successful in formulating a solution.  Clearly understanding the problem will enable you to be effective in gathering the right information for your answer.  You should be able to state it clearly and easily. 

Some examples are:

  • What is the overall cost and schedule status of my project?
  • Are my people overbooked in the fourth quarter?
  • What factors are impacting my project’s delivery?

Otherwise, you are going to waste time pulling together data while looking for a question to answer. 

What are you doing with the information?

The second question relates to how the information will be used.  Essentially, the purpose will help drive the how to best structure the outcome.  Many people fail to identify this aspect correctly, resulting in information that is not structured properly to meet the need.  I’ll cover each of these outcome types in detail in subsequent posts.

Most likely, you are doing one or more of the following to either draw a conclusion or to illustrate a point.  While these are broken out as distinct entities for illustration purposes, in many cases you will be using a combination of techniques.

In this example, we are attempting to understand why sales are so dismal in the North region for November.

Aggregation

Aggregation

When we look at sales data by region or we break down the number of hours entered against a project by each person, we are doing Aggregation.  We may also create synthetic groups to aggregate data based on some attribute of the underlying data.  In many cases, simple number charts are used to convey the data. 

In the example above, we are looking at the November Sales numbers for the Blue, Green and Red Sales teams by region.  The Red team seemed to do well in November, at least according to this view.

Comparison

stackrank

Typically, if you have aggregated data, it is likely that you need to compare and rank the groups of data.  The Stack Rank is a very common scenario where you are ranking the data by Best to worst, based on some criteria.  Number charts, bar charts and to a lesser extent, pie charts are commonly used for comparisons. 

In this example, the West region actually had the most November sales of any region.  The Red Team, leader on the previous view, actually sold the least amount in the West Region. 

Here’s why the problem statement is important.  Without having a clear definition of the problem, it isn’t apparent which answer is correct, as different conclusions can be drawn from the same data.

Composition

composition

Another fairly common usage of information is to illustrate the composition of the data.  In this scenario, we may be attempting to determine which region has the most salespeople.  When used properly, a composition can be used to quickly convey relevant data.  We see that the East and North Regions are staffed with a smaller number of salespeople than the other regions.  This may give us a clue as to why sales are lagging in the North.

Trends

Timeseries

Another interesting information analysis you may choose to do is to understand how information changes over time.  This type of visualization allows you to understand the direction of progress, beyond the current state, enabling you to determine which items may be more worthy of your attention.  For example, projects that are late but are trending back to being on plan may be better off than a late project which is trending later. 

In this example, we see that sales in the North region are flat and actually beginning to decline.  In a real investigation, we would likely dig into this trend further since all other regions are growing.

Variation / Distribution

Distribution

Another way to visualize the data is to visualize how the data varies for a given period.  In many cases, the temptation is to only look at aggregated values or averages, but sometimes it’s the distribution of the data which tells a more compelling story.  Readers who have a statistical background will be very comfortable with this type of information as distributions, variances and other such items are core to statistical investigation.

In this case, we see the majority of deals for November are small deals, with a second peak.  This view would also provide a wider view of what’s happening.  Do we have a training issue?  Have vendors decided to cut back on orders due to the economy?  Are there other factors at play?  Without this view, these questions may not have been asked.

Relationship

Relationship

The last information type is to map out relationships.  If you are deriving information from people relationships, you might here the term “social graph”, which is one way to construct, visualize and consume relationship data..  Relationship maps may uncover potential dependencies between items like people, which are not normally covered by work management and financial management tools. 

In our example, one item jumps out of the data, in that the North region is covered out of one office.  As the other growing regions are covered out of multiple regions, there may be collaborations on how to approach a customer that aren’t happening in the Chicago office.  These collaborations may be resulting in more, small sales in the other offices.  Further research is warranted but you should consider relationship mapping as part of your information arsenal.

What behavior do you want to occur, as a result?

One last aspect to consider is what story should the Information you gather tell?  One way to determine the form of the story is to decide what behavior you want to occur as a result of the information you’ve gathered.  Targeting specific behavior helps you decide in which fashion the data will be presented.

For example, if the focal point of the your information is to ensure certain upcoming tasks are completed on time, you may determine that a stack rank of incomplete tasks, ranked by days until Due Date, is the best way to present the data. 

In the example used in this post, your intent may be to get another salesperson hired in the Atlanta office for the Blue team to support the North Region.  You are able to illustrate a sales decline, some potential reasons for it and you would have to present where investment could improve the situation.

Three aspects that can be used to determine your information requirements were covered in this post.  We’ll dive deeper into these and other relevant information in future posts.