Hartpury Student Research Journal

Home » Issue 3 (Summer 2017) » Independent Study Articles » How effective are different environmental enrichment strategies for stabled horses?

How effective are different environmental enrichment strategies for stabled horses?

Author Name: Sofia Saunders; BSc (Hons) Bioveterinary Science



For environmental enrichment strategies to be successful they must improve the biological functioning of captive animals, however this is difficult to prove, and success is often measured by reduction of abnormal behaviours. Recent literature that aims to enrich stables considers the use of mirrors and toys. The design of a study is critical to the reliability of the data it produces. Larger sample sizes will give a more accurate representation of the whole population, however, this is not always practical when subjects are large animals. When convenience sampling is used, bias can occur. In addition, when selecting a sample the age and breed of the animals should be considered as thoroughbred type horses are more likely to display stereotypies and youngsters are more susceptible to developing them. Longer trials are more likely to produce more accurate results as novelty is taken into consideration. Behavioural changes during short trials are likely to be due to novelty, and as the animal habituates, behaviour may revert. The management strategy should be consistent, as any changes not taken into consideration may compromise results. Observations by continuous sampling are the most accurate but this is not always possible over a longer period, therefore scan and instantaneous sampling is often used. This is less accurate as behaviours may be missed and therefore not included in results. Overall, the literature suggests that improving the visual horizons of stabled horses via mirrors and windows significantly reduces stereotypic behaviour and although it is difficult to assure biological significance, as the data is consistent over time, it suggests that these animals are experiencing improved biological functioning. Stable toys however consistently result in no long-term behavioural changes and therefore provide minimal enrichment with no biological significance.


1.0 Introduction

Environmental enrichment is defined as ‘an improvement in the biological functioning of captive animals resulting from modifications to their environment’. Increased reproductive success, fitness and overall health are indicators of improved biological functioning. However, as there are no standardised methods for assessing the success of an enrichment technique, it is difficult to specify an appropriate endpoint (Newberry, 1995). It has been suggested by Leach et al., (2000) that any environmental changes should decrease the frequency of abnormal behaviours, increase the frequency of natural behaviours and maximise the environment utilisation. Therefore, in many cases, the improved biological functioning is measured by the reduction of abnormal and stereotypic behaviours.

Stereotypic behaviour is defined as a repetitive and unvarying behaviour with no obvious goal or function induced by frustration, repeated attempts to cope and/or CNS dysfunction (Mason 2006). Often stereotypic behaviours develop because of frustration caused by aspects of management including time spent stabled (Bachmann, Audige and Stauffacher, 2003) amount of exercise and weaning methods (Waters, Nicol and French, 2002). It is important to treat the problem, rather than just prevent it, as this causes further frustration and compromised welfare (Cooper and McGreevy, 2007). Therefore, it is important to enrich the environment of animals kept in unnatural situations to improve their welfare.

Numerous studies have researched the effects that various visual stimuli have on the behaviour of horses and other species, using techniques such as mirrors, pictures and windows, to evaluate whether they reduce stereotypy, increase positive behavioural indicators or decrease negative behavioural indicators. It has been known for decades that non-human primates can recognise themselves in mirrors (Hayes and Hayes, 1955; Hoyt, 1941), and more recently this has been shown in other species such as orcas (Delfour and Marten, 2001) and magpies (Prior, Schwarz and Güntürkün, 2008).  However in social species where mirror self-recognition has not been demonstrated, it has been suggested that mirrors may act as a ‘social substitute’. More recent literature considers the efficacy of novel items such as stable toys. If increasing visual horizons improves the biological functioning of stabled horses then is the research into these newer techniques required?


2.0 Discussion

2.1 Sampling

Research investigating how mirrors affect the behaviour of horses whilst travelling and stabled have used a mixture of mares and geldings of a variety of ages. This creates a diverse sample population so a more accurate representation of the whole population can be obtained (Allmark, 2004). In their studies, Whisher et al., (2011) used 6 horses whereas Bulens et al., (2013) used 35 horses. This is important as a wider range of data can be collected from a larger sample size (DePaulo, 2000). However, when the subjects are large animals it would be impractical to use numbers such as those used by Hadley et al., (2006) who researched stereotypic behaviours in 83 mice.

Mills has also carried out work, one in partnership with Cooper and McDonald (2000), researching the effect of increased visual horizons via windows and the other with Riezebos (2005) researching the role of a conspecific image. These studies were all carried out on the same yard, and some of the same horses were used. Convenience sampling was used for simplicity; therefore, the level of sampling error is likely to be high and the research may be vulnerable to selection bias (Saunders, Lewis and Thornhill, 2012).  Problems can also arise when using the same horses in repeated trials, as they can become familiar with the test situation (Martin and Bateson, 1993). Although, in studies looking specifically into stereotypic behaviour reduction, it is important for the researcher to choose horses that most reliably display the behaviours, using a variety of different horses on different yards may have given a more accurate representation of the whole population. Horses used in these studies were predominantly geldings, with one study using no mares and the other using just one, therefore it is not possible to investigate the influence of gender, unlike in the study by McAfee, Mills and Cooper (2002) where an even number of mares and geldings were used.

The majority of horses used by Ninomiya et al., (2008) and Cooper, McDonald and Mills (2000) were Thoroughbreds whilst various breeds were used by Kay and Hall (2008). Breed is important as it has been suggested that the Thoroughbred type is more likely to develop a stereotypy (Bachmann, Audige and Stauffacher, 2003; Waters, Nicol and French, 2002). Therefore, it is likely that in the studies using Thoroughbreds, more stereotypic behaviour could be found, and this could be harder to treat. Although it is useful to use a range of breeds to get a better overview of the whole population, there are other differences between breeds such as temperament which are not accounted for in most studies that use multiple breeds (Bulens et al., 2013)

Age is an important factor to consider when assessing the success of an enrichment strategy. Young horses are more susceptible to developing stereotypies, and adults which have displayed stereotypies for many years will be harder to treat (Strickland, 1997). Henry et al., (2008) used only young animals in comparison to Goodwin, Davidson and Harris (2002) who used animals of a variety of ages. Using animals of a range of ages will represent the population more accurately, as animals that are all at the same stage of development may be likely to react in a similar way.

McAfee, Mills and Cooper (2002) and Mills and Riezebos (2005) used horses that had all reliably weaved for at least 2 years. Whilst in a similar study by Cooper, McDonald and Mills (2000) a horse which had only been weaving for 6 months was used. This should be an important aspect to control for as horses that have displayed stereotypies for shorter lengths of time may  potentially be suggested to be easier to treat.

Studies with a control group are of higher value than those without, as it gives the researcher a baseline to which they can compare their results (Moller, 2011). In studies researching the behaviour of individual animals, using the same subjects as a control can give more reliable results, as behaviour varies greatly between individuals. Cooper, McDonald and Mills (2000) used 5 separate horses as a control group. This would not have given an accurate baseline of behaviours compared to such as McAfee, Mills and Cooper, (2002) who used the same horses to run a one week long ‘pre-trial’, where stereotypies were recorded without any treatment.


2.2 Trial length and novelty

The period over which research is carried out is vital to the reliability of the results, especially if the study aims to use environmental enrichment to improve long-term biological functioning. Thorne et al., (2005) stated that they aimed to discover whether behaviours observed over short-term trials persist over longer periods, however, their trial only lasted 18 days which is significantly shorter than other studies that lasted up to 12 weeks (McAffee, Mills and Cooper, 2002). Although they found that behavioural changes persisted over the 18 days, ultimately Thorne et al., (2005) failed to fulfil their aim as the study period would need to be longer to have any biological significance.

When a trial is designed to allow the subjects to spend a longer period of time with the enrichment item, it will produce results that will be more viable long-term. This is an important factor when considering whether there will be an improvement in biological functioning. Horses spent 5 weeks with a mirror in a study by McAffee, Mills and Cooper, (2002). This was significantly longer than time periods used in other research in the broader field such as Henry et al., (2008) study in which starlings only had access to a mirror for 17 minutes. Therefore, it is more likely that the findings by McAfee, Mills and Cooper will persist long term, however, as with all enrichment studies a suitable study end-point is difficult to determine (Newberry, 1995).

Novelty can cause a problem when researching the effects of items added to animals’ environments. If studies are extremely short, changes in behaviour that result from the new item may be short term and diminish over time as the animal habituates (Stansfield and Kirstein, 2006). Some studies carry out procedures pre-study, to reduce the likelihood that the results are due to novelty. Kay and Hall (2008) exposed horses to the mirror before the study. This should remove any chance that the horses’ reaction to the mirror is because it is novel. A study by Goodwin, Davidson and Harris (2002) states that novelty was tested via the use of hay as a single forage, although it is not stated how this would detect any novel effects, therefore the efficacy of their technique is unclear.

The novel effect of items used in research where the testing time is very short is evident in a study by Stachurska et al., (2013). Here toys were presented to horses in stables for a total of 15 minutes. This is significantly shorter than in other studies (e.g. Bulens et al., 2013) where toys were presented for the duration of an entire week. As the toys were only presented for 15 minutes each, it can be suggested that it is likely that any behaviours directed towards the toy or as a result of the toy’s appearance would be due to novelty and will not provide any improved biological functioning. If the toys were left longer, it would give a chance for habituation to occur and any behaviours to return to what they were previously.


2.3 Management

The management of animals involved in studies is important as any dramatic changes to normal management conditions could compromise the authenticity of the results. It is also important to keep the management of all participating horses the same and not to change management strategies during the research unless it is being tested, in which case it should be stated as an independent variable.

Bulens et al., (2013) studied horses in three different locations, and across these locations the management strategies varied, however the management of each horse within each different location was kept consistent and remained so throughout the trial. Whereas Jørgensen, Liestøl, and Bøe (2011) and McAfee, Mills and Cooper (2002) only studied horses in a single location, but the management strategies varied, with some horses being kept stabled at night and some being allowed access to pasture 24 hours per day. This may compromise results as it has been shown that keeping horses stabled has a strong effect on their behaviour (Lee et al., 2011). A limitation in the research by McAfee, Mills and Cooper (2002) lies with the fact that horses were kept in different stables with varying amounts of visual stimulation, from being in a busy area of the yard, to being visually isolated with no opportunity to view other horses or yard activities. The amount of social stimulation a horse receives could affect the time spent practicing certain behaviours, therefore it should have been a priority to keep all horses in stables with an equal opportunity to see conspecifics. One set of stables also had built-in anti-weave bars, however it is unclear how dramatic this was as the efficacy of anti-weave bars is argued (McBride and Cuddeford, 2001). In addition during this study, after week 8, the management regime changed; horses went from having turnout 16 hours per day, to being stabled almost full time. Both management points were not taken into consideration when results were being analysed even though they could have had an effect on findings. However, it is difficult to assess how much of an effect this actually had on results, as a significant reduction in weaving was found and this is consistent with the results of other similar studies (e.g. Mills and Riezebos, 2005; Cooper, McDonald and Mills, 2000).

The time that horses are fed is important in relation to the timing of observations as they are likely to display more stereotypic behaviour prior to feeding or potentially stimulating times of the day (Mills and Nankervis, 1999). Cooper, McDonald and Mills (2000) observed horses before, during and after feeding had occurred and took these timings into consideration when analysing results. In comparison to this Mills and Riezebos (2005) only took observations directly prior to feeding or exercise, and Ninomiya et al., (2008) only observed the horses from 13:00 to 15:30, when they were left undisturbed. All three of these studies could have potentially produced different results, due to the selected observation periods.


2.4 Observations

There are several ways that observations can be made over a period of time. Filming means an observer does not need to be present, therefore they cannot influence behaviour (Stachurska et al., 2013; Whisher et al.,2011; Henry et al., 2008; Kay and Hall, 2008; Ninomiya et al., 2008; Thorne et al., 2005; Goodwin, Davidson and Harris, 2002). Goodwin, Davidson and Harris (2002), Thorne et al., (2005), Henry et al., (2008) and Kay and Hall, 2008) used continuous sampling, whereas Whisher et al., (2011) and Ninomiya et al., (2008) recorded set time periods throughout the trial and used different sampling methods to assess the behaviour. This means that the data they collected is an estimation as they could have missed some behaviours, making their data potentially less accurate than those that recorded the whole time.

Although continuous sampling may give the most accurate representation of behaviours displayed, it is not always possible, as the studies may be carried out over several days. Therefore, a number of studies (e.g. Bulens et al.,2013; Mills and Riezebos, 2005; McAfee, Mills and Cooper, 2002; Cooper, McDonald and Mills, 2000) used scan sampling whilst others (e.g. Jørgensen, Liestøl, and Bøe, 2011; Whisher et al., 2011; Swaisgood et al.,2001) used instantaneous sampling. These terms are often used incorrectly and interchangeably and both also have their limitations. Unlike continuous sampling, they may be subject to bias, as the observer may be inclined to include conspicuous behaviours, even if they do not occur exactly on the sample point. Bias may also occur as some individuals or behaviours are more apparent than others. Bulens et al., (2013) had sample points at four minute intervals. This is longer than other studies and means that their data is less reliable as the shorter the interval, the more accurate the sampling is at estimating the duration of behaviours (Martin and Bateson, 1995). Their results show no significant behavioural changes, however, a different outcome may have been achieved if sample points were closer together.  Most studies use a single observer, however Whisher et al., used multiple observers. In this case inter-observer reliability may affect results as observers may disagree on the behaviours observed.


 3.0 Conclusion

Results of studies researching the use of stable toys consistently show that they do not provide enrichment, as the novelty of the new toy seems to soon wear off. As short term and longer studies using horses both found there to be no biologically significant enrichment, the industry need for these toys is minimal and no rationale is provided for further research into this field.

On the other hand, improving visual horizons seems to be a promising way of enriching a stabled horses’ environment, as all the methods discussed produced a reduction in locomotors stereotypic behaviour. Some studies found the greatest reduction in the final days/weeks of the trial. This suggests that behaviours were still diminishing, therefore further research over a longer period of time is warranted. There is no way to say for certain that any biological significance was found in these studies, but as results consistently show a reduction in stereotypy, they suggest that animals are experiencing improved biological functioning by reduction of abnormal behaviours and promotion of natural behaviours, which ultimately can only have a positive effect on welfare.



Allmark, P. (2004) Should research samples reflect the diversity of a population?. J Med Ethics. 30, pp. 185-189.

Bachmann, I., Audige, L., Stauffacher, M. (2003) Risk factors associated with behavioural disorders of crib-biting, weaving and box-walking in Swiss horses. Equine Vet Journal (35) pp. 158–163.

Bulens, A., Van Beirendonock, S., Van Thielen, J. and Driessen, B. (2013) The enriching effect of non-commercial items in stabled horses. Applied Animal Behaviour Science. 143 (1), pp. 46-51.

Cooper, J. and McGreevy, P. (2007) Stereotypic behaviour in the stabled horse: Causes, Effects and Prevention without compromising horse welfare. The Welfare of Horses. 1, pp. 99-124.

Cooper, J.J., Albentosa, M.J. (2005) Behavioural adaption in the domestic horse: potential role of apparently abnormal responses including stereotypic behaviour. Livest. Prod. Sci. 92 (2), pp. 117-182.

Cooper, J.J., McDonald, L. and Mills, D.S. (2000) The effect of increasing visual horizons on stereotypic weaving: implications for the social housing of stabled horses. Applied Animal Behaviour Science. 69 (1), pp. 67-83.

Davison, G.C., Neale, J.M., (1998) Abnormal Psychology. New York: John Wiley & Sons Inc.

Delfour, F., Marten, K. (2001) Mirror image processing in three marine mammal species: Killer Whales (Orcinus orca), false killer whales (Pseudorca crassidens) and California sea lions (Zalophus californianus). Behavioural Processes, (53) pp. 181-190.

DePaulo, P. (2000) Sample Size for Qualitative Data. Available from: http://www.quirks.com/articles/sample-size-for-qualitative-research [Accessed 16.10.16]

Elzanowski, A. and Sergiel, A. (2006) Stereotypic Behavior of a Female Asiatic Elephant (Elephas maximus) in a Zoo. Journal of Applied Animal Welfare Science. 9 (3), pp. 223-232.

Goodwin, D., Davidson, H.P. and Harris, P. (2002) Foraging enrichment for stabled horses: effects on behaviour and selection. Equine Vet J. 34 (7), pp. 686-691.

Hayes, K.J., Hayes, C. (1955) The cultural capacity of chimpanzee. In: J. A. Gavan (Ed.), The non-human primates and human evolution (pp. 110-125). Detroit, MI: Wayne University Press.

Henry, L., Le Cars, K., Mathelier, M., Bruderer, C. and Hausberger, M. (2008) The use of a mirror as a ‘social substitute’ in laboratory birds. Comptes Rendus Biologies. 331 (7), pp. 526-531.

Hoyt, A.M.D. (1941) Toto and I: A Gorilla in the Family. 1st ed. Philidelphia: Lippincott.

Jørgensen, G.H.M., Liestøl, S.H.O. and Bøe, K.E. (2011) Effects of enrichment items on activity and social interactions in domestic horses (Equus caballus). Applied Animal Behaviour Science. 129 (2-4), pp. 100-110.

Kay, R. and Hall, C. (2009) The use of a mirror reduces isolation stress in horses being transported by trailer. Applied Animal Behaviour Science. 116, pp. 237-243.

Korff, S., Stein, D.J. and Harvey, B.H. (2008) Stereotypic behaviour in the deer mouse: Pharmacological validation and relevance for obsessive compulsive disorder. Progress in Neuro-psychopharmacology and Biological Psychiatry. 32 (2), pp. 348-355.

Leach, M.C., Ambrose, N., Bowell, V.J., Morton, D.B. (2000) The development of a new form of mouse cage enrichment. Journal of Applied Animal Welfare Science (3), pp. 81-91.

Lee J., Floyd T., Erb H., Houpt K. (2011) Preference and demand for exercise in stabled horses. Appl. Anim. Behav. Sci. 130, 91–100.

Luescher UA, McKeown DB, Halip, J. (1991) Reviewing the causes of obsessive-compulsive disorders in horses. Vet Med 86: 527-531.

Martin, P. and Bateson, P.P.G. (1993) Measuring Behaviour. 2nd ed. Cambridge: Cambridge University Press.

Martin, P. and Bateson, P.P.G. (1993) Measuring Behaviour: An Introductory Guide. Cambridge: Cambridge University Press.

Mason, G.J. (2006) Stereotypic behaviour in captive animals: fundamentals, and implications for welfare and beyond. In: Mason, G.J. (Ed.), Stereotypic Animal Behaviour: Fundamentals and Applications to Welfare. CAB International, Wallingford, pp. 325–356.

McAfee, L.M., Mills, D.S. and Cooper, J.J. (2002) The use of mirrors for the control of stereotypic weaving behaviour in the stabled horse. Applied Animal Behaviour Science. 78 (2-4), pp. 159-173.

McBride, S. D. and Cuddeford, D. (2001) The putative welfare-reducing effects of preventing equine stereotypic behaviour. Animal Welfare 10, 173189

McBride, S., Hemmings, A. (2009) A Neurologic Perspective of Equine Stereotypy. Journal of Equine Veterinary Science. 29 (1), pp. 10-16.

Mills, D. and Nankervis, K. (1999) Equine Behaviour: Principles and Practice. Blackwell Science, Oxford.

Mills, D.S. and Riezebos, M. (2005) The role of the image of a conspecific in the regulation of stereotypic head movements in the horse. Applied Animal Behaviour Science. 91 (1), pp. 155-165.

Moller, H.J. (2011) Effectiveness studies: advantages and disadvantages. Diaglogues Clin Neuroscience. 13 (2), pp. 199-207.

Ninomiya, S., Kusunose, R., Obara, Y. and Sato, S. (2008) Effect of an open window and conspecifics within view on the welfare of stabled horses, estimated on the basis of positive and negative behavioural indicators. Animal Welfare. 17 (4), pp. 351-354.

Prior, H., Schwarz, A. and Güntürkün, O. (2008) Mirror-Induced Behavior in the Magpie (Pica pica): Evidence of Self-Recognition. PLoS Biology. 6 (8), pp. 1642-1650.

Ross, S.R. (2006) Issues of choice and control in the behaviour of a pair of captive polar bears (Ursus maritimus). Behavioural Processes. 73 (1), pp.117-120.

Saunders, M., Lewis, P. and Thornhill, A. (2012) Research Methods for Business Students. 6th ed. Pearson Education Limited

Stachurska, A., Pieta, M., Kloc, A., Bocian, K. and Cebera, M. (2013) Behavioural response to the toy in adult horses of various breeds, sexes and ages. Annales Universitatis Mariae Curie-Skłodowska Sectio EE Zootechnica. 31 (4), pp. 61-67.

Stansfield, K.H. and Kirstein, C.L. (2006) Effects of novelty on behavior in the adolescent and adult rat. Developmental Psychobiology. 48 (1), pp. 10-15.

Swaisgood, R.R., White, A.M., Zhou, X., Zhang, H., Zhang, G., Wei, R., Hare, V.J., Tepper, E.M. and Lindburg, D.G. (2001) A quantitative assessment of the efficacy of an environmental enrichment programme for giant pandas. Animal Behaviour. 61 (2), pp. 447-457.

Teijlinen, E. R. V and Hundley, V. (2001) The importance of pilot studies. University of Surrey.

Thorne, J.B., Goodwin, D., Kennedy, M.J., Davidson, H.P.B. and Harris, P. (2005) Foraging enrichment for individually housed horses: practicality and effects on behaviour in domestic horses (Equus caballus). Applied Animal Behaviour Science. 94 (1-2), pp. 149-164.

Visser, E.K., Ellis, A.D. and Vam Reenen, C.G. (2008) The effect of two different housing conditions on the welfare of young horses stabled for the first time. Applied Animal Behaviour Science. 114 (3-4), pp. 521-533.

Waters, A. J., Nicol, C. J., French, N.P. (2002) Factors influencing the development of stereotypic and redirected behaviours in young horses: findings of a four year prospective epidemiological study. Equine Vet J. (34) pp. 572–579.

Whisher, L., Raum, M., Pina, L., Perez, L., Erb, H., Houpt, C. and Houpt, K. (2011) Effects of environmental factors on cribbing activity by horses. Applied Animal Behaviour Science. 135 (1-2), pp. 63-69.

Wilen, E. (2016) Impact on the emergence of ulcers and crib biting in horses.