Our new President wanted to know where our students are coming from, geographically. We have reports that break down our New Entering Freshmen by geographic region, but they aren’t very compelling or easy to consume quickly. We thought it would be neat to map the information somehow.
I decided to use the R statistical package. It is free, has great graphical capabilities and powerful statistical tools (needed to generate distributions for the heat maps). It’s not that user-friendly though, unless you are the type that loves to immerse themselves in code. If you are that type of person, it’s a pretty fun software package to work with.
I found an example that I really just mimicked at:
Also, I found a library of ZIP codes with their associated latitude/longitude coordinates available for R at:
So, I pumped a list of ZIP codes I took from our new student’s permanent address in ISRS (our student record system) over the past ten years into R. From those I was able to generate lat/long coordinates, and display them over a map downloaded from Google:
You can quickly see that while we draw many students to our Winona campus locally, we also draw heavily from the twin cities area. The transfer students coming to our Rochester campus are mostly from that immediate area.
The code I used to generate these maps is at R_HeatMap_Script_20121005. There are a couple helpful comments embedded in that code, but don’t hesitate to contact me with any questions. This code will not work with the most recent version of R, because the graphical library it relies on (ggplot2) has been re-written. But, this script will run using R version 2.12.2 available at http://cran.r-project.org/bin/windows/base/old/2.12.2/.
The ggmap R library (https://sites.google.com/site/davidkahle/ggmap) might be an even better solution, I’m going to explore that next.
One of the benefits of using Microsoft SQL Server Reporting Services (SSRS) to distribute data to end-users is that you can set parameters on a report that allows the end user to configure what data is returned by the report. For instance, we have a Course Detail report that returns section-by-section details of courses offered include credits, number students enrolled, instructor, meeting time, etc. That report has parameters that allows the end user to choose what terms to return data for, what course subjects, which instructors, etc.
This particular report can run very slow, especially when it is run for a many terms all at once. The SQL query behind the report runs fast, and we’ve optimized indices for the query, but the SSRS report runs slowly. Very slowly. We see this with many of our reports: we can retrieve the data quickly when we pull it directly from the SQL Server, but we have significant performance issues when we run the query via SSRS.
We stumbled across the culprit recently. When an SSRS parameter contains a string value, SSRS coerces the selected values to unicode before passing them to the SQL Server. For instance, if we wanted to pull all our student registration records for 5 specific terms, we would write a query similar to:
select * from ST_COU where YRTR in ('20113', '20115', '20121', '20123', '20125')
But, if YRTR (year-term) were a parameter in a SSRS report, SSRS would configure the query as:
select * from ST_COU where YRTR in (N'20113', N'20115', N'20121', N'20123', N'20125')
Those “N” prefixes coerce the values to a unicode type. But, our YRTR variable is a char(5), not a nchar(5), so to do the comparison SQL Server must cast YRTR to the nchar format.
In a simple query like the one above it really doesn’t matter. But in a more complex query the effect on performance can be significant. Before we fixed our Course Detail report, it ran for over ten minutes before our DBA called and said the server’s CPU usage was at 95%. After adjusting for this issue, the report runs in about 10 seconds.
There are several possible fixes, and you might think of even more. One, obviously, is to refactor your database to use Unicode types, like nchar and nvarchar. If you are already using those variable types, you are not experiencing this problem. Changing to Unicode though risks causing performance problems in existing applications.
Another fix is to re-write your queries to force the IN statements to work on a smaller set. For instance, while our ST_COU table is huge because it includes a record for every registration that has occurred at WSU, there is another much smaller table YRTR_CAL that contains one record for every term. So, the following query is far more efficient for us:
select * from ST_COU s where exists ( select * from YRTR_CAL where YTRT=s.YRTR and YRTR in (N'20113', N'20115', N'20121', N'20123', N'20125') )
Another technique we have used is to create temporary tables in our SSRS query, casting certain fields to nchar and nvarchar, and using those temp tables to subset with the WHERE statement. We have even created indices on those temporary tables to improve performance.
We don’t want to change the types of existing fields in our database to address this problem for fear of causing performance problems in other applications. But, we are considering adding additional fields to some of our tables to address the most critical issues. For instance, we could create a YRTR_N field in our ST_COU table of type nchar(5) that otherwise holds the same values as the YRTR field, and use that new field in our WHERE statements. This has the advantage of not requiring extra joins in our queries, and allows us to have existing indices on the new field that would improve performance. Or, we could create a Unicode YRTR_N in our YRTR_CAL table which would still be more efficient because we could build indices on that new field.
This issue can sometimes just be ignored in reports where the IN statements are short and the query is not complex. However, in other reports the cost of ignoring this issue in terms of performance is huge. And Microsoft doesn’t appear to be about to change how this feature works, as their article at http://support.microsoft.com/kb/2000524 states, this SSRS behavior is “by design”.
In the Fall of 2010, Winona State University implemented a cloud-based application from Hobsons® called Connect® - a CRM (Customer Relationship Management) tool designed to help manage student communications from the inquiry phase through application to enrollment. A significant challenge in the implementation was extracting, transforming, and loading the Connect database with information from ISRS – our student information system.
Like most Hobsons’ clients, we initially used SFTP (a secure version of FTP that encrypts data during transmission) to upload data files to Connect. To obtain data for uploading, we used a set of views and stored procedures to extract and transform information from our student information system. We used a PowerShell® script to run the stored procedure, export the results to a file, and upload the file to Connect via SFTP. A diagram of this process is shown below:
Although this process worked initially, we quickly ran into a few problems and concerns. The first of these was just the sheer bulk of data. Use of Connect grew very rapidly to the point where we have created well over a 100 attributes and needed to upload data every day. Uploading this large amount of data proved to be a challenge. We attempted to use local history files to upload only new or changed information, but ran into problems when Connect imports failed or rejected records. We also could not control the timing of Connect imports, which sometimes did not finish until long after a file was uploaded.
In addition to these issues, members of our IT Systems group voiced concerns regarding security. While SFTP encrypts files during transfer, an unencrypted copy of the file existed on a file share prior to transmission. Assess to the file share was limited, but concerns lingered over so much data existing in plain, unencrypted format.
To address these issues, we turned to using SOAP-compliant web services provided by Hobsons. A web service is a special communications protocol designed to allow two computers to exchange information over the Internet. Commands and data are encoded into XML messages and delivered over the Internet to the other computer using HTTPS, the same Internet protocol used by many web sites to exchange credit card other sensitive information. In general, our new process works as follows:
The new process is very secure and has the advantage of supporting more granular updates and better logging. In my next Blog post, I’ll provide some actual examples of calling Hobsons’ web service using PowerShell.
Our IPAR group produces many online reports that help administrative offices at WSU to do their job more effectively. These reports can quickly retrieve lists of student data from our central student data repository, ISRS, that meet a variety of criteria. These results can be exported to Excel for further processing.
One example is our International Office’s Admitted Students report. This report allows the user to select a term and retrieve a list of international student’s that have been admitted for that term. The report shows ID number, student name, VISA status, some financial information, etc that is stored in ISRS. Much of this data is meant to indicate if the student has completed all the steps required for them to enroll at WSU.
The International Office finds this report quite useful, but there is data not stored in ISRS that they need to keep track of as well. Has the student booked an airline ticket? Have they checked into the office when they get here? What day are they scheduled for an orientation session? There is a whole list of data they need to keep track of that isn’t stored in ISRS, and doesn’t need to be. This information will never be reported on, and is irrelevant after the student has enrolled for classes.
So the practice has been that at some point the International Office runs the Admitted Student report and exports it to Excel. They then add columns for the extra information they need to keep track of. They sort the data, color code rows for various purposes and generally use this Excel document to keep track of their students between the time they are admitted and they enroll. They are quite happy with this procedure, except that it is difficult for them to keep track of when new students have been admitted and need to be added to this Excel document. They also have a hard time keeping track of when data in ISRS changes and therefore needs to change in the Excel document.
IPAR considered several options for helping the International Office here, from full CRM solutions, custom software, to utilizing SharePoint lists and/or Access databases. Our primary motivator was to come up with a solution that was quick and easy to maintain and that would be a significant improvement to the International Office’s situation, if not a perfect solution.
We and they are pretty happy with what we came up with. IPAR developed a macro-driven Excel document that can read (but not write) data from the ISRS database. The Excel document contains the same data as the Admitted Students report contains, but also allows the International Office to add their own data. Whenever they hit the “Refresh” button, any newly admitted students are added to the file. Any data from ISRS that has changed is updated, and highlighted in green (indicating to the International Office that they should re-assess the student’s status). Any student’s that are no longer admitted are highlighted in red, and the International Office can delete these students once they have assured themselves that the change in admit status wasn’t made in error. The data can be sorted, color coded and so on without breaking the functionality.
This is a pretty new tool for us, and I’m sure we have bugs to work out, but I am very optimistic. This type of “smart” Excel document can be used in many situations across our campus, especially on the administrative side of the house. My biggest concern at this point is that these spreadsheets become too popular, and that supporting them becomes too time-intensive. But, our plan is to stick to just this one for now and learn from the experience, and take it slow from here.
Here at Winona State we have invested significant time and energy in campus-wide assessment efforts for many years. Online surveys are the backbone of those efforts, and Qualtrics (www.qualtrics.com) is our tool of choice for conducting surveys. One strength of the system is the reporting tool set available, and one unobtrusive Qualtrics feature that we have found very useful in combination with that tool set is the ability to import survey responses from a text file straight into a Qualtrics survey. If that doesn’t sound tremendously helpful at first blush, just bear with me.
Imagine being the administrator of two different surveys, both delivered to university sophomores, one in fall, and one in spring. Responses to both surveys have been collected, and the surveys have closed. At that late stage, you realize that it would be very helpful to explore the correlations between a question on Survey A and several questions on Survey B. It would also be helpful to include the college affiliation (e.g., College of Business versus College of Liberal Arts) of the respondents in your reporting. In our office, we frequently find ourselves in similar situations. Obviously, there are many solutions to the problem, but Qualtrics provides a good one.
It is relatively straightforward in Qualtrics to approach the problem this way:
I will follow this post with an example of how we’ve used this technique for quick, ad hoc reporting that brings together data from multiple surveys with SIS information using only duct tape and common household items. Stay tuned.
For those of us who spend time querying Microsoft SQL Server, each new version of the product gives us new functionality that can be as helpful as a new belt full of tools to a carpenter. One tool (first available in SQL 2008) that has proved useful in our ad hoc queries is the expanded VALUES clause. I use it in three primary ways:
Some SQL statements that spell out the details of those three techniques follow. The first two are fairly easy to understand by simply running the SQL code. The third technique is a bit more involved: for more explanation and code samples, refer to this article in SQL Server Magazine by Itzik Ben-Gan.
-- declare a table variable for use below declare @TableVar table ( TermId int, LicenseId int, MaleCount int, FemaleCount int, [Description] varchar(500) ) -- first use of the values clause: one insert statement inserting multiple rows. insert into @TableVar values (20083, 1, 12, 15, 'Sample data - bogus'), (20093, 4, 31, 47, 'Sample data - bogus'), (20103, 3, 50, 22, 'Sample data - bogus'), (20113, 2, 67, 83, 'Sample data - bogus'), (20123, 4, 39, 58, 'Sample data - bogus') -- second use of values clause: inline, quick & dirty derived table select t.TermId, t.LicenseId, t.MaleCount, t.FemaleCount, tmp.* from ( values -- use values to create derived table (1, '130301-N','Science Chemistry 9-12 + 5-8 General Science'), (2, '130302-N', 'Science Physics 9-12 + 5-8 General Science'), (3, '130400-N', 'Science Earth 9-12 + 5-8 General Science'), (4, '130600', 'Science General 5-8') ) as tmp (LicenseId, LicenseCode, LicenseDescription) -- name columns join @TableVar t on tmp.LicenseId = t.LicenseId -- third use of values clause: aggregate function over columns -- rather than rows. Better performance than unpivot with large sets. select TermId, ( select max(mycol) from (values ('FemaleCount', FemaleCount), ('MaleCount', MaleCount)) as mytable(mylabel, mycol) ) maximum_value from @TableVar -- compare syntax above to case statement select TermId, case when FemaleCount > MaleCount then FemaleCount else MaleCount end maximum_value from @TableVar -- now compare to unpivot select TermId, max(col) as maximum_value from ( select TermId, col from @TableVar unpivot (col for cols in (FemaleCount, MaleCount)) as unpivott ) p group by TermId