# Open, reliable, and useful evaluation

## *Open* evaluation (and rating)

Traditional peer review is a closed process, with reviewers' and editors' comments and recommendations hidden from the public.

In contrast, a[ll *Unjournal* evaluations\*](#user-content-fn-1)[^1] (along with authors' responses and evaluation manager summaries) are made public and easily accessible. We give each of these a separate DOI under our ISSN (3071-2173) and work to make sure each enters the literature and bibliometric databases. We aim further to curate these, making it easy to see the evaluators' comments in the context of the research project (e.g., with sidebar/hover annotation).

Open evaluation is more useful:

* to other researchers and students (especially those early in their careers). Seeing the dialogue helps them digest the research itself and understand its relationship to the wider field. It helps them understand the strengths and weaknesses of the methods and approaches used, and how much agreement there is over these choices. It gives an inside perspective on how evaluation works.
* to people *using* the research, providing further perspectives on its value, strengths and weaknesses, implications, and applications.

Publicly posting evaluations and responses may also lead to higher quality and more reliability. Evaluators can choose whether or not they wish to remain anonymous; there are [pros and cons to each choice](https://github.com/unjournal/unjournal_gitbook/blob/main/benefits-and-features/more-reliable-and-useful-evaluation/broken-reference/README.md), but in either case, the fact that all the content is *public* may encourage evaluators to more fully and transparently express their reasoning and justifications. (And where they fail to do so, readers of the evaluation can take this into account.)

The fact that we are asking for evaluations and ratings of all the projects in our system—and not using "accept/reject"—should also drive more careful and comprehensive evaluation and feedback. At a traditional top-ranked journal, a reviewer may limit themselves to a few vague comments implying that the paper is "not interesting or strong enough to merit publication." This would not make sense within the context of *The Unjournal*.

## More reliable, precise, and useful *metrics*

We do not "accept or reject" papers; we are *evaluating* research, not "publishing" it. But then, how do other researchers and students know whether the research is worth reading? How can policymakers know whether to trust it? How can it help a researcher advance their career? How can grantmakers and organizations know whether to fund more of this research?

As an alternative to the traditional measure of worth—asking, "what tier did a paper get published in?"—*The Unjournal* provides *metrics:* We ask evaluators to provide a specific set of ratings and predictions about *aspects* of the research, as well as aggregate measures. We make these public. We aim to synthesize and analyze these ratings in useful ways, as well as make this quantitative data accessible to meta-science researchers, meta-analysts, and tool builders.

Feel free to check out our [ratings metrics](https://open-2c.gitbook.com/url/globalimpact.gitbook.io/the-unjournal-project-and-communication-space/policies-projects-evaluation-workflow/evaluation/guidelines-for-evaluators#metrics-overall-assessment-categories) and [prediction metrics](https://open-2c.gitbook.com/url/globalimpact.gitbook.io/the-unjournal-project-and-communication-space/policies-projects-evaluation-workflow/evaluation/guidelines-for-evaluators#journal-prediction-metrics).

These metrics are separated into different categories designed to help researchers, readers, and users understand things like:

* How much can one believe the results stated by the authors (and why)?
* How relevant are these results for particular real-world choices and considerations?
* Is the paper written in a way that is clear and readable?
* How much does it advance our current knowledge?

We also request *overall* ratings and predictions . . . of the credibility, importance, and usefulness of the work, and to help *benchmark* these evaluations to each other and to the current "journal tier" system.

However, even here, the *Unjournal* metrics are also precise in a sense that "journal publication tiers" are not. There is no agreed-upon metric of *exactly* how journals rank (e.g., within economics' "top-5" or "top field journals"). More importantly, there is no clear measure of the relative quality and trustworthiness of the paper *within particular journals.*

In addition, there are issues of lobbying, career concerns, and timing, discussed elsewhere, which make the "tiers" system less reliable. An outsider doesn't know, for example:

* Was a paper published in a top journal because of a special relationship and connections? Was an editor trying to push a particular agenda?
* Was it published in a lower-ranked journal because the author needed to get some points quickly to fill their CV for an upcoming tenure decision?

In contrast, *The Unjournal* requires evaluators to give specific, precise, quantified ratings and predictions (along with an explicit metric of the evaluator's uncertainty over these appraisals).

Of course, our systems will not solve *all* problems associated with reviews and evaluations: power dynamics, human weaknesses, and limited resources will remain. But we hope our approach moves in the right direction.

## Better feedback

See also [mapping-evaluation-workflow](https://open-2c.gitbook.com/url/globalimpact.gitbook.io/the-unjournal-project-and-communication-space/policies-projects-evaluation-workflow/mapping-evaluation-workflow "mention").

## Faster (public) evaluation

We want to reduce the time between *when research is done* (and a paper or other research format is released) and *when other people (academics, policymakers, journalists, etc.) have a credible measure* of "how much to believe the results" and "how useful this research is."

Here's how *The Unjournal* can do this.

1. *Early evaluation:* We will evaluate potentially impactful research soon after it is released (as a working paper, preprint, etc.). We will encourage authors to submit their work for our evaluation, and we will [directly commission](https://open-2c.gitbook.com/url/globalimpact.gitbook.io/the-unjournal-project-and-communication-space/policies-projects-evaluation-workflow/considering-projects/direct-evaluation-track) the evaluation of work from the highest-prestige authors.
2. *We will pay evaluators* with further incentives for timeliness (as well as carefulness, thoroughness, communication, and insight). [Evidence suggests](https://www.aeaweb.org/articles?id=10.1257/jep.28.3.169) that these incentives for promptness and other qualities are likely to work.
3. *Public evaluations and ratings*: Rather than waiting years to see "what tier journal a paper lands in," the public can simply consult *The Unjournal* to find credible evaluations and ratings.
4. See [#why-should-researchers-and-groups-submit-their-work-to-and-engage-with-the-unjournal](https://open-2c.gitbook.com/url/globalimpact.gitbook.io/the-unjournal-project-and-communication-space/faq-interaction/for-researchers-authors#why-should-researchers-and-groups-submit-their-work-to-and-engage-with-the-unjournal "mention")

<details>

<summary>Can <em>The Unjournal</em> "do feedback to authors better" than traditional journals?</summary>

**Maybe we can?**

* We pay evaluators.
* The evaluations are public, and some sign their evaluations.
  * → Evaluators may be more motivated to be careful and complete.

**On the other hand . . .**

* For public evaluations, people might defer to being overly careful.
* At standard journals, referees do want to impress editors, and often (but not always) leave very detailed comments and suggestions.

</details>

[^1]: Subject to some potential temply embargos and rare exceptions for sensitive early-career researchers. We discuss these elsewhere. As of June 2023 there have been no such embargos or exceptions.