This can be seen in Fig. A Standard deviation of the Von Mises component in blue from the mixture model fitted to samples of the model shown in Fig. The theoretically-calculated Fisher information is shown in green for the associated population codes. B Mixture proportions of the mixture model fitted on the model samples. C P-value for a resampling-based estimation of the probability of the non-target mixture proportion to be different than zero.

The concentration of the Von Mises component in blue , closely follows the theoretical Fisher information in green , although overestimating it. The Fisher information provides a good local estimate of the variability around a mode, as can be seen in Fig. The mixture proportions corresponding to the target, non-target and random responses are shown in Fig. The mixture proportion associated with the random component appears to be overestimated, compared visually to the distribution of the samples of Fig.

However, the mixture model well characterizes the proportion of misbinding errors. Finally, as a last check, we verified that the mixture model estimates of non-target proportions were reliable. To do this, we performed a resampling-based analysis of the mixture of non-target responses, by randomizing the assumed locations of non-target angles and re-fitting the mixture model.

Using the empirical cumulative distribution over those samples, we could then compute a p-value for the null hypothesis that the mixture probability for non-target would be zero. The results are shown in Fig. We applied this resampling analysis to the human experimental data shown in Fig. The p-values for the data collapsed across subjects are shown above the histograms of biases towards non-target angles; they all are significant. Redoing the analysis per subject indicates that for 2 items, 8 out of 12 subjects show significant misbinding errors; for 4 items, 7 out of 12 are significant; and finally for, 6 items 10 out of 12 subjects show misbinding errors.

This uses two layers, the lower of which can either be a conjunctive or feature population, parametrised as described above. The hierarchical code comprises two layers: the lower layer receives the input, and is randomly connected to the upper one, which provides possibly additional binding information. Bottom: layer one consisting of either a feature population code or a conjunctive population code.

Receptive fields of units of a feature population code are shown one standard deviation. Top: effective receptive fields of three layer two units are shown. Layer two units randomly sample a subset of the activity of layer one units, and pass a weighted sum of their inputs through a nonlinearity. Such an hierarchical code can be considered an abstract representation of a layered neural architecture [ 38 ]. It allows us to check how structured the binding information should be for the results to hold.

The optimal arrangement changes markedly when multiple items must be stored. Having few random binding units is very efficient in the single item case, but this breaks down completely when multiple items are stored and interfere with each other. The dependence of the memory fidelity on the ratio of upper to lower units is similar for increasing number of items, with the exception of the overall scale.

Unsurprisingly, memory fidelity is lower when increasing the number of items and conjunctivity, see Fig. As shown in Fig. Moreover, there is an optimal ratio of upper to lower units when storing multiple items, if one tries to optimise the proportion of correct target angle recall.

Left : Memory fidelity based on model samples, while varying the ratio of lower to upper layer units in a hierarchical population code with a constant number of units. The number of randomly placed items increases from top to bottom. The memory fidelity decays with increasing item number and conjunctivity. Right : Mixture proportions based on model samples. For a single item, the correct target angle is always retrieved blue curve.

The drop for high ratio of upper to lower layer is expected, as few units are left in the lower layer to represent the item appropriately. For increasing numbers of items, nontarget responses are prevalent green curve , but including a suitable proportion of upper layer units does allow the appropriate angle to be retrieved. Random responses are marginal with the parameters used here. Despite being drastically different in its implementation of conjunctivity, it provides a good fit to the experimental data.

The hierarchical code is able to capture the trend of decay in both experiments to a greater extend than the mixed population code main plot shows a fit to [ 7 ], inset shows a fit to [ 1 ]. However, the fit for 4 and 5 items for [ 1 ] does show discrepancies with the experimental data. The optimal parameters obtained for those fits resemble those for the mixed population code, namely a high ratio of higher-level binding units and large input noise. These render promising this class of hierarchical codes.

Model fit green; the penumbra represents one standard deviation to the human experimental data blue; data from [ 7 ]. These qualitative fits are similar to those obtained for a mixed population code see Fig. Inset: fit for [ 1 ]. Notice the difference in performance for large number of items. The patterns of errors arising from specific choices of population codes can be used to help discriminate between different representations.

Misbinding, which we quantify via the mixture model approach of [ 7 ], is of particular value, since, as observed, it is rare for conjunctive codes; but ubiquitous for feature codes. We therefore compare the misbinding exhibited by human subjects with the output of our model based on different population codes see Methods for details about the optimisation.

As can be seen in Fig. The graphs quantify different sorts of error in terms of the weights in a mixture model capturing local variability around an item, misbinding errors and random choices [ 7 ]. Human experimental curves are shown on the bottom right. This shows how misbinding errors are crucial components to fit human performance.

As expected, the feature code makes a large number of misbinding errors when more than one item is stored. On the other hand, the conjunctive code makes only a few errors that appear to arise from random guesses. Misbinding errors are highly unlikely in this configuration. In total, a mixed code provides a better fit to the human data, matching the increase in non-target responses as well as a baseline random response rate.

This analysis suggests that stimuli specifically designed to induce patterns of misbinding could be useful for understanding representations in population codes. Consider three stimuli, arranged on a diagonal, separated by a variable distance in feature-space illustrated in Fig. These create clear interference patterns for feature codes, with multi-modal posteriors and misbinding errors. These errors will be expected to change as a function of the characteristics of the population code. Feature-space representation of three stimuli used to study misbinding errors and characteristics of the population codes.

This set of items will generate interference patterns as shown by the dotted lines. The circles represent one standard deviation of the receptive field response levels.

- IN ADDITION TO READING ONLINE, THIS TITLE IS AVAILABLE IN THESE FORMATS:!
- Lower Fat Christmas!
- What Is The Modal Model Of Memory?!
- A Probabilistic Palimpsest Model of Visual Short-term Memory.
- Models of Memory?
- City in the Clouds (The Aiki Chronicles Book 2)!
- An Improved Long Short-Term Memory Model for Dam Displacement Prediction.

The green circles represent a population code in which the three stimuli are well separated. The blue circle represents a code for which all the stimuli lie inside a single receptive field and would generate misbinding errors. The target is randomly chosen on each trial as one of the three items.

In particular, by making the stimuli close to each other in feature space, this pattern allows intra-receptive field misbinding to be examined.

## Benchmarks for models of short term and working memory -ORCA

This happens when the pattern lies entirely in a single receptive field of a conjunctive unit, and can thus provide a somewhat crude and indirect measure of the receptive field size of a mixed conjunctive code. Note, though, that hierarchical conjunctive codes cannot be expected to have such a simple signature; and indeed even mixed codes are ultimately likely to be multi-scale in character. In Fig. We report how the parameters of the mixture model we considered before vary with conjunctivity in several conditions, using a population code of units, and allowing the ratio of conjunctive to feature units to vary from 0 to 1 corresponding to full-feature and full-conjunctive, respectively.

The goal is to recall one of the three items, randomly chosen on different trials. Shaded regions correspond to one standard deviation around the mean over 10 repetitions. The target is randomly chosen for each trial.. In this case, no amount of conjunctivity can help the discrimination between the three stimuli. This corresponds to a situation in which intra-receptive field misbinding occurs. For the single-scale receptive fields that we employed to create the mixed population code, it is possible to recover the scale from the error patterns as a function of the separation between the stimuli.

This is shown in Fig. This plots the target blue and non-target green mixture probabilities normalized by their joint sum. These start at the same value, but diverge after the point when conjunctive information becomes available and hence when intra-receptive field misbinding become less prevalent. The black vertical dotted line indicates half the size of the receptive field for the conjunctive subpopulation—misbindings stop being prevalent once the stimuli covers multiple receptive fields. Once this point is reached, each stimulus lies in its own receptive field, so misbinding should not happen.

This is again in agreement with the results, with very few non-target responses in this regime. This shows data as in Fig. We compute the ratio between the target mixture proportion and the sum of the target and non-target mixture proportions in blue. We do the same for a non-target mixture proportion in green. The black vertical bars show half the size of a conjunctive receptive for each population.

We see that for separations smaller than the size of a receptive field, misbinding errors are prevalent. This changes as soon as the pattern of stimuli covers more than one receptive field. The vertical red dashed bar shows twice the size of a receptive field. In this situation, each stimulus occupies one receptive field, and misbinding should rarely occur.

We originally expected a hierarchical population code to perform differently, since it encodes binding information in a quite different manner. However, surprisingly, we find consistent results, as can be seen in Fig. When the separation is large, the hierarchical code also behaves in a regular fashion similar to that of the mixed code as the degree of conjunctivity increases.

When conjunctivity is low, the memory performs poorly, as no binding information is present. However, as conjunctivity increases, performance does as well. Interestingly, performance with a hierarchical code increases monotonically with conjunctivity before dropping sharply when the input lower layer population decreases past the required precision needed to discriminate the stimuli. This architecture uses conjunctive information quite effectively, but does not attain the same maximum performance.

The situation is less clear for a small distance between stimuli. Having a large proportion of conjunctive units is actually detrimental in this case, as the input lower layer decreases in size, and thus the encoding precision decreases with it. Hence there is an optimal proportion of conjunctive units for a given required minimum discrimination.

- Working Memory.
- Mathematical Problems in Engineering.
- Receiving Direction From Above!
- To View More....
- Evaluation of the MSM.
- Welcome to BetterHelp!.

Hence the hierarchical code seems to discriminate smaller patterns for a given population size, which is surprising for such a crude representation of a hierarchical representation. Thus we find that even this simple stimulus pattern can provide something of a formal window into misbinding and the structure of receptive fields. We built a model of short-term visual working memory, assuming a single population of units, an additive, palimpsest, storage scheme and sample-based probabilistic recall. We showed how this model could qualitatively reproduce key aspects of human experimental data, including the decrease in performance with memory load, and also error distributions, including misbinding errors, which have not previously been the focus of theoretical study.

It is the next phase of this work to fit human data quantitatively, looking in detail at individual differences in performance and patterns of errors. We studied several different sorts of population code. The most critical question concerns binding, which in our case is performed by conjunctive units that are sensitive to specific combinations of two or more features.

Non-conjunctive, feature-based codes, can be more efficient at storing single items, but fail catastrophically whenever multiple items are stored simultaneously. We considered including both single-feature and conjunctive units, and showed that a combination is likely to offer a better characterization of experimental data than either alone. Finally, we considered experiments that would offer useful guidance to discriminating theories. The assumed error distribution was thus a mixture model with two components: a Von Mises centred around the target item, and a random uniform component.

The alternative models are based on the notion of a finite resource [ 1 , 7 — 11 ], arguing against a fixed number of slots, but rather that there is a constraint on the whole collection, such that storage of multiple items leads to interference. By comparison, our model, as a palimpsest, can best be seen as abandoning the notion of slots altogether—be they finite or infinite—and so does not need a mechanism for allocating the slots.

There is a finite resource—the population of units that can be active—but this leads to two resource-like limitations on storage, rather than one. The first limitation is noise—this acts just like some of the resource limits in previous models. The second limitation is representational—the fact that the items overlap in the palimpsest in a way that depends on how they are encoded in the population implies a form of interference and interaction that leads to misbinding.

This explicit element has been missing in previous treatments. Other factors have also been implicated in this pattern, such as different memory encoding precision on different trials [ 10 , 41 ], or the limited width of neuronal tuning functions [ 15 ].

### The Central Executive

It would be straightforward to extend our scheme to allow for partial information about which item will have to be recalled. We have shown how our model can encode information about each feature separately, with the binding information being provided by another subpopulation. A model along related lines was recently proposed by Swan and Wyble [ 42 ]. However, one could think of other ways to encode and store this binding information, for example by using object-files.

If one were to limit how many object-files could be used at a given time, and if object-files made errors in binding the features together, this would provide an hybrid slot-based treatment of the problem. Another related model has been suggested in the context of dynamic field theory [ 43 , 44 ]. These authors consider a population of rate-based units with temporal dynamics governed by first order differential equations. Given specific layers and connectivity patterns, they simulate the evolution of bumps of activity through time, which can be used to store information for later recall.

In their model, feature binding is completely linked to space in that each feature is stored in different feature-space population bound only to location. A separate working memory population stores the locations of all items seen. Recall relies on using location to couple and constrain the possible features to their original values. That the dynamical e. Further, location cannot be the only variable determining binding given experiments in which items are presented at the same location but at different times. Our model is agnostic about the source of binding in its input, lending itself to the study of different representations.

Nevertheless, it would be interesting to model richer aspects of the temporal evolution of the memory state. Here, we assumed that only two features were stored per item, namely colour and angle. However, we report in Section, 5 in S1 Text the effect of using more than two features.

One feature that is particularly important is spatial location. In the actual experiments in [ 7 ], space which, for simplicity and consistency with [ 1 ], we treated as another angular variable was used as the cueing feature, with colour being recalled. It is possible, given the importance of space for object recognition, that spatial tuning has quite different characteristics from that of other cues. Hints of this are apparent in the properties of early visual neurons. This could make it a stronger cue for recall and recognition, something that it would be interesting to examine systematically through experiment and the model.

With more features, we could address directly one of the key findings that led support to the slot models, namely the observation of an object benefit in recalling features. That is, despite the sometime fragility of episodic memory [ 47 ], which this functionally resembles, remembering a fixed number of features is easier when those features are parts of fewer conjunctive items.

The magnitude of that effect has been the subject of intense debate, but there is broad agreement about a significant object benefit [ 48 — 52 ]. In our model, such effects arise through two mechanisms: first, having fewer items will add less encoding noise to the final memory state, which will directly reduce the overall noise level in recall. Second, the conjunctive units also directly contribute to the storage precision for bound items.

## Short-term memory: A brief commentary

Our model would thus also show an object benefit without additional machinery. Our model treats storage as a bottom-up, feedforward process. However certain top-down effects are known, such as directed forgetting [ 53 , 54 ]. Such an effect could be accommodated in the model by considering a multiple step process in which following regular storage, recall would be executed based on the cue for the to-be-forgotten item, with the representation of whatever is retrieved being subtracted from the previous memory state. As this would still be a noisy process, the resulting precision for the other items would be less than if the forgotten item had never been stored at all, albeit still greater than if its main influence over the memory state remained.

We made a number of simplifying assumptions, notably to do with the noise model and the sampling process. For the former, we only considered additive isotropic Gaussian noise corrupting the encoding. This could be readily extended to more complex noise models, for example to a more neurally plausible Poisson noise model. The key difference from using Poisson noise would be its signal-dependence—storing larger numbers of items would lead to greater activities and thus a higher variance.

Signal-dependent Gaussian noise is a related modelling choice [ 30 , 31 , 55 ]. Amongst other differences, this would reintroduce the second term in the equation for the Fisher information Equation This term can be large compared to the first [ 55 ] and it adds extra inferential complexity [ 56 ], hence fully accounting for it can be complicated. We considered a process of recall that involves the full posterior distribution over the responses.

Determining how the brain would use and represent distributional information has been an active recent research topic. One set of ideas considers what amounts to a deterministic treatment albeit corrupted by noise [ 57 — 62 ]. However, there is a growing body of research showing how the brain might instead use samples [ 63 — 66 ], and we adopted this approach. Inference might involve combining together larger numbers of samples, and thus reporting some noisy function of the posterior other than just the samples.

However, such operations are currently underdetermined by the experimental data, as they would interact with other sources of noise. Sampling from the posterior instead of simply reporting the maximum a-posteriori mode value has the additional benefits of capturing variability around the mode itself, which varies depending on the representation used. Nevertheless, it is important to stress that this sampling scheme is not the main bottleneck in our model. Rather, it is the representation that constrains the nature and magnitude of the errors in recall.

The sampling scheme simply provides a mechanism for reporting on the ultimate posterior distribution. A more limited report, such as the MAP value, would likely lack the appropriate characteristics by reflecting too little of this distribution. One of the major tools that we used to analyse the population codes was the Fisher information and the associated Cramer-Rao lower bound.

However, this is only useful if the posterior distribution is close to being Gaussian, and, in particular, unimodal. This will almost always be true for a single item; and often be true when there are multiple items and a conjunctive population code that solves the binding problem. However, as we saw, feature codes lead to multimodality, rendering a direct application of the Cramer-Rao lower bound useless. What is still possible is to use the Fisher information as an indication for the variability around one of the mode. We have shown how it still produces a good approximation to the width of a mode, even in the presence of misbinding errors.

We characterized misbinding errors through a mixture model and a resampling-based estimator. It is also possible to assess the multimodality of the posterior itself directly, for example by fitting a parametric mixture model on the posterior. This analysis leads to similar results. But it would then be possible to analyse this multi-modality analytically, and perhaps obtain a closed form expression for the proportion of misbinding errors expected from a given posterior.

We considered a case of recalling only a single item given a memory. It would be possible to treat recall differently, with a mixture model, estimating the features associated with all items, and thereby answering the memory query directly. Total recall could be performed using a fixed finite mixture model, e. Approaches of this sort have been pursued by various recent authors [ 67 — 71 ].

For instance [ 71 ] considered both the encoding and recall to be implemented with a Dirichlet process mixture model. They show how this provides a natural account of ensemble statistics effects that can be seen in some experiments, such as regression to the mean of the presented samples. By contrast, our approach is closer to the experimental paradigm, as there is no evidence that subjects recall all features of all items when asked to recall an unique item. Regression to the mean still arises, but from local interactions between items in the representation.

Indeed, even for a conjunctive code, when items are close-by the recalled angle will be biased towards the mean of all items, as bumps of activity merge together. There is substantial precedence for the approximation of focusing on a single item, ignoring some or all of the statistical structure associated with other actual or potential items [ 72 — 75 ]. Our results depend crucially on the nature of the underlying population code. As a proof of principle, we tested two schemes—one mixing feature-based and conjunctive codes; the other building a hierarchy on top of feature codes.

However, many more sophisticated representations would also be possible—studies of population coding suggest that using multiple scales is particularly beneficial [ 76 , 77 ], and it would be interesting to test these. For our single-scale case, we suggested a particular pattern of three stimuli that we expect to be of particular value in discriminating between different population coding schemes. The pattern was designed to promote misbinding in a way that would also be revealing about the size of the receptive fields.

We also expect there to be a strong effect of distance in stimulus space on misbinding probability, if a mixed-like representation is used. On the other hand, by the very nature of our hierarchical population code, it is harder to make specific predictions about the dependence of proximity and other features on misbinding probability. If subjects were too proficient at recall from this pattern, as might be the case for just three items [ 1 ], it would be straightforward to complicate the scheme to include a larger number of items.

An interesting extension to this analysis would be to introduce an asymmetry in the pattern of stimuli, in order to displace the mean of the stimuli from the centre stimulus. This would in turn introduce asymmetric biases and deviations for the different items depending on the sources of the errors. Indeed, as briefly mentioned above, it has been shown that the mean statistics of the stimuli have an effect in determining responses characteristics.

Such an asymmetric pattern would indicate if the variability is biased towards the mean of the stimuli or to close-by items only. There is substantial work on population-based working memory with a foundation in persistent activity [ 78 ], and even in the gating of storage necessary to make such memories work efficiently [ 79 , 80 ]. It would be interesting to study the extra constraints that come from a more realistic neural implementation. In conclusion, we proposed a model which accounts for errors in working memory by considering explicitly the link between storage and representation.

We showed it can successfully account for key aspects of the psychophysical data on visual short term memory, and allows for a better understanding of the relationship between being precise in the representation of single features and the representation of binding information across all the features of a single pattern to be able to handle cued recall. Based on observations on the form of the errors arising when recalling information from a palimpsest memory, we proposed a specific stimulus template that would produce different error patterns depending on characteristics of the underlying representation, and so we suggest as an attractive target for psychophysical investigation.

Here, we provide a complete description of the processes of storage and recall, repeating material from the main text as appropriate for convenience. We assume continuous firing-rate style units. They have Bivariate Von Mises tuning curves, corrupted by isotropic additive Gaussian noise: The firing rate of unit m is: The receptive field sizes were set automatically to achieve maximum coverage given a population of M units.

Given a fixed number of units with preferred stimuli arranged uniformly over the feature space, the receptive field sizes were modified such that one standard deviation of the receptive field would cover the space uniformly without redundancy.

### Introduction to Memory Storage

The storage process for N items is probabilistic and follows the following model: 22 Multiple items are summed to produce the final memory state y N , which is, in turn, corrupted by additional, independent, Gaussian, noise. Recall is based on the simplifying assumption that a single item is modelled, while others are collapsed into a single source of noise. We obtain estimates by sampling memory items from the storage process before estimating those two empirical estimates.

In addition to the classical slice sampling algorithm, we introduce Metropolis-Hastings jumps, which can randomly set the sampler in another part of the state space. This allows to jump between modes in a multi-modal posterior setting. We discard the first samples as burn-in steps for the slice sampler. This allows us to sample appropriately from the full posterior. We use the mixture model of [ 7 ], allowing for a mixture of target, non-target and random responses. We fit the following mixture component, using the expectation-maximization algorithm: 28 To check for the significance of non-zero mixture proportion p nt , associated with non-target responses, we perform a resampling analysis.

Given a set of responses, targets and non-target angles, we randomly resample the non-target angles and refit the mixture model. The Fisher information for a population code with Gaussian noise is: 30 where f is the mean response of the population, and C the covariance of the population response. The FI about the angle is given by 32 33 We perform a grid search over several population code parameters to provide a qualitative fit to human experiments. A full fit, which is the subject of future work, would require at least the consideration of heterogeneous and multi-scale population representations.

Additional derivations and results omitted from main manuscript. Derivations include the computation of the large population limit for Fisher information and the relation between the memory fidelity and the Fisher information. We report the stimuli separation analysis for the hierarchical code, analogous to the analysis of Fig. Following the comments of a reviewer, we studied the relationship between the conjunctivity ratio and the population size in a mixed population code, as our parametrisation creates a dependence between them.

Finally, we show how increasing the number of features affects the ratio of conjunctivity for a fixed population size. We thank Nikos Gorgoraptis and Masud Husain for sharing data. We would like to thank colleagues who have read the paper, including Laurence Aitchison, Alexander Lerchner and Pedro Goncalves, for their thorough and very helpful comments. Amongst other changes, an additional section was added to the supplementary material in the light of their remarks. Performed the experiments: LM. Analyzed the data: LM PD.

Abstract Working memory plays a key role in cognition, and yet its mechanisms remain much debated. Author Summary Humans can remember several visual items for a few seconds and recall them; however, performance deteriorates surprisingly quickly with the number of items that must be stored. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Data Availability: All the source code is available on a Github public repository DOI Introduction The ability to store information about the world and use it at a later time is a critical aspect of human cognition, and comes in many different forms.

Download: PPT. Results We propose a model of representation, storage and recall in visual working memory. Representation Consider the case of a population of M units representing the memory of all items seen in a trial. Figure 4. Recall model and posterior for different population codes. Modelling visual working memory experiments First, the model reproduces the baseline, apparently uniform, component of the distribution of errors see Fig.

Fisher information analysis A common theoretical technique used to study the representational capacity of a population code is the Fisher information FI , which, via the Cramer-Rao lower bound, limits the precision of any estimator based on the output of the code [ 28 — 30 ]. Figure 8. Misbinding errors when varying the proportion of conjunctive units. Figure 9. Memory fidelity and mixture proportions as a function of the ratio of conjunctive units.

Figure Memory fidelity and misbinding errors as function of conjunctivity in hierarchical population code. Comparisons of population codes Effects on experimental data fits. The canary. Recall of stimuli shown in Fig. Patterns of errors as a function of stimulus separation for different proportions of conjunctive units. Discussion We built a model of short-term visual working memory, assuming a single population of units, an additive, palimpsest, storage scheme and sample-based probabilistic recall.

Methods Here, we provide a complete description of the processes of storage and recall, repeating material from the main text as appropriate for convenience. Representation We assume continuous firing-rate style units. Storage and recall process The storage process for N items is probabilistic and follows the following model: 22 23 x i is the representation of item i by the population code. Mixture model fitting We use the mixture model of [ 7 ], allowing for a mixture of target, non-target and random responses.

We fit the following mixture component, using the expectation-maximization algorithm: 28 29 where p t is the mixture proportion associated with the target, p r the random mixture proportion and p nt the non-target mixture proportion. Fisher information derivation The Fisher information for a population code with Gaussian noise is: 30 where f is the mean response of the population, and C the covariance of the population response.

The FI about the angle is given by 32 33 34 The other components of the Fisher information matrix can be derived similarly. Parameter optimization We perform a grid search over several population code parameters to provide a qualitative fit to human experiments. Supporting Information. S1 Text. Supplementary Material. References 1. The Journal of Neuroscience — Cerebral cortex New York, NY: Proceedings of the National Academy of Sciences Author information: 1 Department of Psychology.

Comment in Psychol Bull. Psychol Bull. It is currently unrealistic to expect a theory to explain them all; theorists must satisfice with explaining a subset of findings. The aim of the present article is to make the choice of that subset less arbitrary and idiosyncratic than is current practice.