sci25006 — Announcement

Gemini Observatory Archive Updates

February 28, 2025

The Gemini Observatory Archive (GOA) is the primary means for the community to access Gemini data, including both authenticated PI access to proprietary data and open access to public data. The archive provides a light-weight, responsive, and simple interface, with a “single search form” approach and a “RESTful” interface that has proven popular with our user community.

All facility instrument data is transferred to the archive in near real time. We recently completed ingesting older raw data from the MAROON-X visiting instrument at Gemini North, and are ingesting reduced GHOST data from the U.S. National Gemini Office. We are also working towards automatic DRAGONS reduction of data within the archive system itself.

Development and upgrade of the GOA is a continuous process, with both user-facing changes such as adding new features and support for new instruments, and more behind the scenes work such as keeping up with software updates and the changing security and network environment in which we operate. Here we’d like to highlight a few new features that were recently added.

Sign-in via ORCID

New science data at Gemini has a proprietary period during which it is available only to the program under which it was observed. The traditional method of accessing this data requires users to login using a GOA specific username and password.

To simplify this process, the GOA login page now has an option to sign in via ORCID. Selecting this link will redirect you to the ORCID website to either sign in or register. Any researcher can sign up for an ORCID iD which then acts as a persistent identifier that can be used across a wide range of services. Unlike an email address, an ORCID iD is not tied to a particular institution where a researcher is currently based.

Once authenticated, users will be redirected back to GOA where you’ll now be signed in with your ORCID iD. The archive doesn’t have access to any details of your ORCID record; ORCID redirects you back to GOA with a cryptographic token that verifies your ORCID authentication. The software system that allows this (OAuth 2.0) also allows NOIRlab staff to sign in using their NOIRLab SSO credentials.

Currently, even if signed in by ORCID, PIs still need to use their OT program key to register for access to their proprietary data. However, the new Gemini Program Platform (GPP) software will use ORCID, and once this is deployed the program key registration step will no longer be necessary.

Transitioning an existing GOA account to ORCID

If you have an existing GOA account, and you simply “Sign in via ORCID”, the archive has no way to know that your existing username and password correspond to your ORCID ID. Thus, you will end up with two accounts and things like your registered programs won’t transfer to your ORCID account.

To associate an ORCID iD with an existing GOA account, users should first login using your GOA username and password, and then “Sign in via ORCID.” If you don’t already have an existing archive account, the best and simplest option is to simply sign in via ORCID. If you do end up with two accounts, please just file a helpdesk ticket and we will merge them for you.

Spam Robots and the dreaded “Blocked” message

The open and link-rich nature of the GOA web interface has made it an inviting target for nefarious software agents, aka “robots.” There are defined protocols on the web for telling those robots where they are, and are not, welcome, however, we have recently seen a massive increase in the number of queries to the archive from web robots that do not follow these rules.

Unfortunately, this makes the archive, specifically the search results pages, susceptible to rampaging bots that find an archive results page and then start blindly following every link on the page, ignoring web directives that tell it not to.

These bots act in a coordinated and distributed manner, with obvious sequences of related queries originating from multiple IP addresses. In late 2024, it became obvious from the load on the archive servers, and a cursory examination of the server logs, that the vast majority of queries the archive was serving were not from legitimate users.

Automatically detecting and blocking these nefarious requests is not trivial as they do not respect robots.txt rules, they use user agent strings claiming to be from common desktop web browsers, and they are distributed such that no single IP address is responsible for an excessive number of requests. So, rather than requiring all users to login to search the archive, or potentially interesting a “CAPTCHA” step, software has been implemented to identify offending IP address allocations and block requests from them. This has already significantly cut down the number of bot requests.

There will inevitably be occasional legitimate users who find their IP address range has been blocked, for which we apologize. The basics are explained in the message you see if your address range is blocked. To bypass this block simply log in to the archive. We never block requests from logged-in users, and we don’t block access to the login page. We’d also appreciate a helpdesk ticket to let us know that this happened as this will allow us to refine our filtering rules.

We don’t know for sure what these robots are, as they’re not search engines in the traditional sense. Many of these requests originate from cloud compute companies in Asia, while a fair number also originate in South America. Some of the origin addresses suggest that the queries are coming from malware on infected consumer devices, which is the most likely reason that legitimate users may get blocked. If many addresses on your ISP are sending millions of bogus requests, your entire ISP will be blocked. Our current hypothesis is that these robots are blindly harvesting web pages to train Generative AI Large Language Models.

About the Announcement

Id:
ID
sci25006

Images

Timelapse of the night sky over Gemini North

Credit: International Gemini Observatory/NOIRLab/NSF/AURA/J.Chu/J.Pollard

Gemini Observatory Archiv

Credit: International Gemini Observatory/NOIRLab/NSF/AURA