Event Report: Red Flags: What do you really need to know when buying Psychometrics?

Event Chair: Maria Gardner, Director of Operations, ABP

Presenters:

Ben Williams: Director, Sten10

Debbie Stevens-Gill : outsourcing expert and BPS Board member

Aaron Halliday: University of Western Ontario, Publisher and Executive Coach

Alan Redman: Head of Assessment Criterion-Psycruit

Stewart Desson: CEO, Lumina Learning

Penny Moyle: Executive Coach, formerly of OPP

The meeting was opened by Dr. Alex Forsythe, ABP Chair, who welcomed the large number of attendees to the event.

The meeting opened with the Chair describing the background to the session, highlighting that the idea for the meeting arose as a result of a blog posted on LinkedIn on the pitfalls of buying psychometrics, and the subsequent deluge of posts in response. Maria invited the panel to introduce themselves and asked each of them a jargon busting question to assist participants.

Debbie Stephens-Gill. Board Member with responsibility for Stage 2 of BPS qualifications. She is an outsourcer and a trainer for a range of psychometrics.

What is a Psychometric? A standardised measure of a Psychological construct. This can be anything from personality to ability or attitudes that can be reported on. Outsourcing: Publishers should work to maintain their levels of competence. They should ensure that they are compliant with BPS recommended standards and submit their products for test reviews.

Ben Williams: Director Sten10. Undertakes assessments under contract.

Buyers need to concentrate on bold statements offered by publishers which might be over optimistic and might affect later performance.

Ben Williams: Buyers need to concentrate on bold statements offered by publishers which might be over optimistic and might affect later performance.

What is the difference between an ability test, a personality questionnaire and an SJT? An ability test is a test of comprehension reasoning, numerical or verbal. The higher the score the better. Personality tests measure feeling at work, evaluation of performance under pressure, and evaluation in specific settings. Situational Judgement Tests measure reasoning personality in the context of a work environment. The can measure a two way fit for a job.

Dr. Alan Redman, Head of Science and Technology, Criterion-Psycruit, Fellow of Birkbeck College. Formerly BPS head of Test Verification.

What is reliability and why is it important? Users of tests need to know that what they are using is fit for purpose. The difference between these and other forms of measurement are that these are internal and can be subject to greater margins of error and considerable investment is needed to gather evidence and ensure consistency and reliability.

Dr. Stewart Desson, CEO of Lumina Learning, leading edge publishers of a wide range of trait based psychometric products.

What is validity and does it really matter? This concerns what is being measured and the extent to which it correlates with desired outcomes in the context which it purports to measure. A psychometric should align with the desired outcomes. There are two main forms: convergent and divergent validity. These measure the extent to which the psychometric correlates with other measures so as to achieve some consistency with other valid tests. The desired outcome is the prediction of performance at work. Evaluative bias can impede validity, for example, too much emphasis on extraversion over introversion which can result from errors in the underlying construct and distort results.

An increasingly important aspect of validity is the usability of the results. Increased complexity means that this can be challenging: increased use of infographics enables increased levels of accessibility and understanding of results.

Aaron Halliday. Graduate of University of Western Ontario, Publisher and Executive Coach.

What is a norm reference group? A normative assessment is one which is set for a particular group, e.g. collectivistic Asian cultures in an international context. The groups to be assessed should normed to “Intended Use Case”. This is to establish consistency across the sample.

Penny Moyle, Trainer and user of assessments

What is Type and Trait? Trait measures personality on a continuum/spectrum. whereas Type measures discrete “categories”. An interesting type analogy is left handedness vs righthandedness, not to be confused with manual dexterity which can be measured on a spectrum. The Type method of measurement is useful in a development context, whereas the Trait approach is more appropriate for selection and assessment.

Three questions were polled to participants

Who is in the room?
- 49% psychologists
How would you describe your level of experience using psychometrics?
- 33% users
- 23% experienced users
What do people use psychometrics for?

- Mainly Selection, Coaching and Development

Is it OK to use a poor-quality tool just to start the conversation?

Aaron Halliday considered that poor quality tools can indeed be an ice breaker. However, with HR budgets being increasingly stretched, it is tempting to complete the work using a cheap product and some internal verification resource. However, the result is that so much money is wasted. What the industry needs is a more jargon busting approach, with an emphasis on products being high on reliability, adding value and accuracy in assessment and presented in a way that clients can understand. The industry has a responsibility for ethics and communicating more strongly that poor decisions that cascade into other areas can at best be ineffectual and at worst damaging.

Ben Williams argued strongly that buyers should avoid tests where there are few means of assessing validity. Situational Judgement Tests (SJT) have their usefulness in specific contexts with caveats. However, it should be pointed out that not all unvalidated tests are poor quality. An excess of negativity can stifle innovation in new tools in the sector. Buyers should treat tests with an appropriate level of fairness while keeping an eye on potential adverse impacts. Ben also raised the matter of standards and rigour in assessment. BPS has produced sets of standards of reliability which link job tenure and performance to the Big Five.

Alan Redman considered the issue from an Evidence Based approach which utilises a systemic method for decision making. Using a good example, he frequently gets asked by customers to produce a SJT : commercially producing a bespoke solution sounds an attractive proposition but will it deliver what the customer wants? Does the customer actually know what they want? The decision making process involves asking the right questions, obtaining the appropriate evidence and then assessing viability, the key point is to obtain a range of evidence and use this to make your decision.

Debbie Stevens-Gill echoed the advantages of Evidence-based practice and the importance of wise investment. She also raised the point about the need to avoid remaining in comfort zones with the wrong tools, and the consequent adverse impact.

Stewart Desson of Lumina talked about going beyond adverse impact into inclusivity. He argued that the most effective way for publishers to become more inclusive is by avoiding a reductionist approach, something more philosophically aligned with a humanistic/business view of the world, with the use of Business Psychology. Inclusivity should be included in the design of the instrument, not as an add-on, in order to minimise the risk of bias. A good example is that extroversion is overvalued in the West as a Trait, whereas some of the benefits of emotion in a profile are poorly understood. The construct in the psychometric needs to embed corrective features so as to compensate for these potential biases.

Alan Redman indicated how this linked into evidence-based practice and drew upon Diversity and Inclusion as a legal requirement which a psychometric has to accommodate. Any testing process should, amongst a range of issues, adhere to relevant local norms and also consider the pipeline of recruitment requirements. Care should be taken in the use of tests to avoid a narrow approach to recruitment which can reduce diversity and promote cloning.

The Event Chair then invited each of the speakers to summarise their one main Red Flag when buying psychometrics.

Check reliability and validity. If the publisher can’t provide the appropriate evidence to demonstrate reliability and validity for the assessment tool for which the assessment claims to be useful, then it should be avoided. (Penny Moyle)
Focus on brand claims. Bold claims that all recruitment and talent management problems can be solved should flash the warning signals (Ben Williams)
Outsourcing of psychometrics. Check the company you are outsourcing to has a certificate of competence and if you are the organisation supplying the service, check that your competence is current and check the BPS test reviews (Debbie)
Caveat Emptor! In a consumer society the world is full of sales people. Get a second opinion, especially from local academics who are all too willing to discuss their subjects, research and preferences. (Aaron)
Validity and the extent to which people trust the process and the results. Watch out for unverifiable anecdotes and those users who will use phrases such as “It Works” or “We have been using it for a long time”. (Alan)
Be aware of sidestepping comments (Stewart)
We are using it just to start a conversation
A publisher might say “it’s not a personality model, it’s more a behavioural model”
General comments used as a “Get out of Jail Free” card to avoid answering crucial questions about the psychometric properties

What is the difference between Type and Trait?

Alan suggested that the key difference was that Trait measurement was the detailed measurement of personality, whereas Type measures core differences and be more effective at identifying key headline areas of strengths and weaknesses. Rather than focusing on a difference, we should start with what we want to achieve with the analysis and gather appropriate data.

Stewart Desson said that when he was first involved with assessments, he used type-based tests. However, having completed a PhD in assessment, he now works in the area of trait-based assessments, which provide data which correlates well, has good validity, and seems to work for him and his clients.

Richard Owen indicated that Type measurement, previously rigid, had moved on. He likened it to a comparison between being left handed and right handed, where no one was exclusively one way or the other. But most of us default to one preference. There is a kind of phenomenology behind Type and when they talk about starting a conversation they are beginning a process of self-exploration which goes far deeper than any measurement process.

Alex Forsythe described how working with Roy Childs made her more aware of the different approaches offered by Type and Trait. Indeed, the BPS expects you to be aware of both and there are examples of good practice in both. A combination can make you aware of a particular strongly represented part of yourself, but then you can look at other parts which require development in a gradual and nuanced way.

Alan Redman agreed that both could have their place, and some publishers, who have produced some good work have acknowledged their limitations. The problems often arise in implementation, where trait tests can be more nuanced and meet expectations. It is often harder to achieve reliability in a Type product simply because you are measuring on a larger scale. But leadership programmes often restrict dramatically the amount of time devoted to psychometrics. A more appropriate question to users might be, why, in a leadership programme, is the amount of time restricted to, say, one hour?

Where should Type or Trait be used? It depends on the users’ starting point and, as Alex pointed out, follow the data. It depends on situations and where the tests have proved to be effective. In general Type is more developmental and there is more hard data which is particularly useful in more general careers discussions, whereas Trait has its use where more detailed analysis is required.

What trends have been observed in the industry?

Leadership. Ben Williams indicated that there have been a lot of requests for work in the leadership space. There is a patchwork of requirements, most of which require the use of a combination of tools for a different set of outcomes. Examples are progress in analysis of leadership styles, and specific purpose assessment where validity is crucial and where much progress has been made.

Another trend has been a growing interest in Cyberpsychology, given the fact that we shall all be working more frequently from home in the future, and how behaviour can change when they are in front of a screen compared to when they are face to face.

Stewart Desson felt that we should distinguish between fashionable trends and long-term trends. We should not be drawn in by short term trends. However, one longer term trend is that organisations are looking critically at evolving leadership styles and vendors need to be more agile in adapting their tools to meet the demands of these emerging requirements. After all, validity only works in a particular context and tools have to be bespoke in the sense that we have to be clear about what we are measuring.

A further trend has been the result of Covid, as a result of working from home. Virtual assessments, for so long regarded as the poor relation, have been forced centre stage and a huge increase in demand has made this part of the industry offer more professionally developed and delivered products.

The Chair observed that there is a trend around demand for assessment of learning agility. This has traditionally been a combination of other assessments but such is the interest, it is almost becoming a discipline in its own right.

A question was raised about the increasing complexity of tools. It was generally agreed that there should be more elementary explanations from publishers about their products thus making it easier to assimilate more complex material as users gain more understanding.

There was disagreement about whether vendors should sell their products on quality and let the market decide or whether there should be some form of consumer protection or regulation to sell psychometrics, as is the case in South Africa.

In conclusion, however, there was general agreement that just because a test is expensive, it does not mean that it is better. As with consumer products, some are more effectively marketed than others and the price is reflected accordingly. With psychometrics, unlike many consumer products, price should take into account the level of service offered which can, to say the least, be variable. Buyers should insist on a workable minimum level of service and suppliers should offer this.

19Feb21.

Share This

Related Posts