//
archives

Ed Callahan

I am IT staff assigned to Institutional Planning, Assessment and Research (IPAR) at Winona State University. On the best days my title is Data Analyst and I get to answer high level questions by exploring data and write-up an thorough analysis. Other appropriate titles, depending on our workload at the time, would be Reporting Specialist, Database Developer or Ad Hoc Data Request Satisfier. I'm also a boy scout leader, school board president and have an unnatural obsession with riding my motorcycle.
Ed Callahan has written 5 posts for The Winona State IPAR Blog

Where do our students come from, with Heat Maps

Our new President wanted to know where our students are coming from, geographically. We have reports that break down our New Entering Freshmen by geographic region, but they aren’t very compelling or easy to consume quickly. We thought it would be neat to map the information somehow.

I decided to use the R statistical package. It is free, has great graphical capabilities and powerful statistical tools (needed to generate distributions for the heat maps). It’s not that user-friendly though, unless you are the type that loves to immerse themselves in code. If you are that type of person, it’s a pretty fun software package to work with.

I found an example that I really just mimicked at:

http://pages.stern.nyu.edu/~achinco/programming_examples/Example__PlotGeographicDensity.html.

Also, I found a library of ZIP codes with their associated latitude/longitude coordinates available for R at:

http://www.r-bloggers.com/my-first-r-package-zipcode/.

So, I pumped a list of ZIP codes I took from our new student’s permanent address in ISRS (our student record system) over the past ten years into R. From those I was able to generate lat/long coordinates, and display them over a map downloaded from Google:

 

Winona NEF map

A heat map of the home (permanent) address from New Entering Freshmen admitted to our Winona Campus over the past 10 years.

 

Rochester NET Map


A heat map of the home (permanent) address from New Entering Transfer students admitted to our Rochester Campus over the past 10 years.

 

You can quickly see that while we draw many students to our Winona campus locally, we also draw heavily from the twin cities area. The transfer students coming to our Rochester campus are mostly from that immediate area.

The code I used to generate these maps is at R_HeatMap_Script_20121005. There are a couple helpful comments embedded in that code, but don’t hesitate to contact me with any questions. This code will not work with the most recent version of R, because the graphical library it relies on (ggplot2) has been re-written. But, this script will run using R version 2.12.2 available at http://cran.r-project.org/bin/windows/base/old/2.12.2/.

The ggmap R library (https://sites.google.com/site/davidkahle/ggmap) might be an even better solution, I’m going to explore that next.

How Microsoft Reporting Services Handles String Parameters Oddly

One of the benefits of using Microsoft SQL Server Reporting Services (SSRS) to distribute data to end-users is that you can set parameters on a report that allows the end user to configure what data is returned by the report. For instance, we have a Course Detail report that returns section-by-section details of courses offered include credits, number students enrolled, instructor, meeting time, etc. That report has parameters that allows the end user to choose what terms to return data for, what course subjects, which instructors, etc.

This particular report can run very slow, especially when it is run for a many terms all at once. The SQL query behind the report runs fast, and we’ve optimized indices for the query, but the SSRS report runs slowly. Very slowly. We see this with many of our reports: we can retrieve the data quickly when we pull it directly from the SQL Server, but we have significant performance issues when we run the query via SSRS.

We stumbled across the culprit recently. When an SSRS parameter contains a string value, SSRS coerces the selected values to unicode before passing them to the SQL Server. For instance, if we wanted to pull all our student registration records for 5 specific terms, we would write a query similar to:

select *  
     from ST_COU
     where YRTR in ('20113', '20115', '20121', '20123', '20125')

But, if YRTR (year-term) were a parameter in a SSRS report, SSRS would configure the query as:

select *
      from ST_COU
      where YRTR in (N'20113', N'20115', N'20121', N'20123', N'20125')

Those “N” prefixes coerce the values to a unicode type. But, our YRTR variable is a char(5), not a nchar(5), so to do the comparison SQL Server must cast YRTR to the nchar format.

In a simple query like the one above it really doesn’t matter. But in a more complex query the effect on performance can be significant. Before we fixed our Course Detail report, it ran for over ten minutes before our DBA called and said the server’s CPU usage was at 95%. After adjusting for this issue, the report runs in about 10 seconds.

There are several possible fixes, and you might think of even more. One, obviously, is to refactor your database to use Unicode types, like nchar and nvarchar. If you are already using those variable types, you are not experiencing this problem. Changing to Unicode though risks causing performance problems in existing applications.

Another fix is to re-write your queries to force the IN statements to work on a smaller set. For instance, while our ST_COU table is huge because it includes a record for every registration that has occurred at WSU, there is another much smaller table YRTR_CAL that contains one record for every term. So, the following query is far more efficient for us:

select *  
     from ST_COU s  
     where exists (     
          select *    
               from YRTR_CAL    
               where YTRT=s.YRTR    
               and YRTR in (N'20113', N'20115', N'20121', N'20123', N'20125')     
     )

Another technique we have used is to create temporary tables in our SSRS query, casting certain fields to nchar and nvarchar, and using those temp tables to subset with the WHERE statement. We have even created indices on those temporary tables to improve performance.

We don’t want to change the types of existing fields in our database to address this problem for fear of causing performance problems in other applications. But, we are considering adding additional fields to some of our tables to address the most critical issues. For instance, we could create a YRTR_N field in our ST_COU table of type nchar(5) that otherwise holds the same values as the YRTR field, and use that new field in our WHERE statements. This has the advantage of not requiring extra joins in our queries, and allows us to have existing indices on the new field that would improve performance. Or, we could create a Unicode YRTR_N in our YRTR_CAL table which would still be more efficient because we could build indices on that new field.

This issue can sometimes just be ignored in reports where the IN statements are short and the query is not complex. However, in other reports the cost of ignoring this issue in terms of performance is huge. And Microsoft doesn’t appear to be about to change how this feature works, as their article at http://support.microsoft.com/kb/2000524 states, this SSRS behavior is “by design”.

Using Excel to Offer Smarter Online Reports

Our IPAR group produces many online reports that help administrative offices at WSU to do their job more effectively. These reports can quickly retrieve lists of student data from our central student data repository, ISRS, that meet a variety of criteria. These results can be exported to Excel for further processing.

One example is our International Office’s Admitted Students report. This report allows the user to select a term and retrieve a list of international student’s that have been admitted for that term. The report shows ID number, student name, VISA status, some financial information, etc that is stored in ISRS. Much of this data is meant to indicate if the student has completed all the steps required for them to enroll at WSU.

The International Office finds this report quite useful, but there is data not stored in ISRS that they need to keep track of as well. Has the student booked an airline ticket? Have they checked into the office when they get here? What day are they scheduled for an orientation session? There is a whole list of data they need to keep track of that isn’t stored in ISRS, and doesn’t need to be. This information will never be reported on, and is irrelevant after the student has enrolled for classes.

So the practice has been that at some point the International Office runs the Admitted Student report and exports it to Excel. They then add columns for the extra information they need to keep track of. They sort the data, color code rows for various purposes and generally use this Excel document to keep track of their students between the time they are admitted and they enroll. They are quite happy with this procedure, except that it is difficult for them to keep track of when new students have been admitted and need to be added to this Excel document. They also have a hard time keeping track of when data in ISRS changes and therefore needs to change in the Excel document.

IPAR considered several options for helping the International Office here, from full CRM solutions, custom software, to utilizing SharePoint lists and/or Access databases. Our primary motivator was to come up with a solution that was quick and easy to maintain and that would be a significant improvement to the International Office’s situation, if not a perfect solution.

We and they are pretty happy with what we came up with. IPAR developed a macro-driven Excel document that can read (but not write) data from the ISRS database. The Excel document contains the same data as the Admitted Students report contains, but also allows the International Office to add their own data. Whenever they hit the “Refresh” button, any newly admitted students are added to the file. Any data from ISRS that has changed is updated, and highlighted in green (indicating to the International Office that they should re-assess the student’s status). Any student’s that are no longer admitted are highlighted in red, and the International Office can delete these students once they have assured themselves that the change in admit status wasn’t made in error. The data can be sorted, color coded and so on without breaking the functionality.

More technical details about how this spreadsheet works can be found here. A copy of the spreadsheet, including the macros we wrote but scrubbed of any actual student data, can be found here.

This is a pretty new tool for us, and I’m sure we have bugs to work out, but I am very optimistic. This type of “smart” Excel document can be used in many situations across our campus, especially on the administrative side of the house. My biggest concern at this point is that these spreadsheets become too popular, and that supporting them becomes too time-intensive. But, our plan is to stick to just this one for now and learn from the experience, and take it slow from here.

WSU Enrollment Projection Model

Winona State has an Enrollment Projection model we presented at the most recent AIRUM (Association for Institutional Research in the Upper Midwest) Conference. We use it primarily to set goals for Admissions for our target New Entering Freshmen (NEF) and New Entering Transfer Student (NET) counts in the fall. We have a pretty good feel for how many students we can accommodate comfortably and we want to set our Admissions goals so we don’t exceed that limit too much.

The model is really pretty simple. We estimate retention rates and advancement rates for Freshmen, Sophomores, Juniors, Seniors, Graduate Students and “Other” (PSEO students, Specials, etc.) based on observed values in previous years. We also estimate the number of new students entering the University in each of these classes using a combination of past data and inputs to the model: Fall NEF and NET counts. From there a simple formula using these values can calculate headcount and FTEs in future fiscal years.

Fall NEF and NET counts are model inputs; we don’t attempt to model them. That won’t make sense for an institution with an open admission policy, but at WSU our Admissions office modifies their practices to attempt to hit an Admission goal. The purpose of this model is to set that goal, not to predict what our incoming class sizes would be if we had an open admission policy.

Our model ignores a lot. We don’t directly model retention and graduation rates. We treat Seniors as one group, but we know we have 4th year, 5th year and 6th year Seniors in the system. Winona State has two campuses (Rochester and Winona), but we don’t include campus in the model. Nor do we look at major declaration or student demographics.

But for all we are ignoring, the model is very accurate. Our error rate for predicting overall FTEs in a fiscal year made in the Fall term of the previous fiscal year ranges from -1.9% to 0.4% over the past five years. We consider that “pretty good for what it is”.

If you want more information, we have it:

Do students know how much money their parents make?

In a New Entering Freshmen (NEF) survey in Fall 2010, WSU asked its incoming freshmen students what their “Best estimate of parents’ total income” was. They were able to select one of seven options:

  • Less than $10,000
  • $10,001 to 25,000
  • $25,001 to 50,000
  • $50,001 to 75,000
  • $75,001 to 100,000
  • $100,001 to 150,000
  • over $150,000

However, WSU also has this data from the FASFA financial aid application form the student/family submitted. Of the 1,633 NEF WSU took in last year, we had FASFA forms on file for 1,393 of the students and we had survey responses from 1,389 students. We had parental income data from both sources for 1,203 of these students.

For these 1,203 students for whom we had data to compare, we contrasted the parental income as reported on the NEF Survey with the Adjusted Gross Parental Income as reported on the FASFA.

Students did not know how much money their parents make.

Overall, 21% of students underestimated their parent’s income and 44% overestimated parental income, leaving only 35% of students that were correct.

Students tended to overestimate parental income, but got (slightly) better the higher their parent’s income was…

Distribution of parents income

It could be that students are estimating their parent’s net income, while the FASFA reflected the parent’s adjusted gross income. However, the percent of under-estimates along with how wildly off some students are in their guess still makes the survey question data, well, questionable.

We would have found a very different frequency distribution of parental income categories had we relied on FASFA data instead of the survey:

So, why do we ask this question on the New Entering Freshmen Survey? That’s a good question, perhaps we won’t anymore.

IPAR Posts

May 2013
S M T W T F S
« Jan    
 1234
567891011
12131415161718
19202122232425
262728293031  

The WSU IPAR Team

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 6 other followers

Follow

Get every new post delivered to your Inbox.