diff --git a/docs/developers.rst b/docs/developers.rst index 9144c03..de32fff 100644 --- a/docs/developers.rst +++ b/docs/developers.rst @@ -10,7 +10,7 @@ Information for Developers This documentation is for people who want to install a test version of FASTDB on their local machine, edit the FASTDB code, or try to install FASTDB somewhere else. (It is currently woefully incomplete for the last purpose.) -The FASDTB code can be checked out from https://github.com/LSSTDESC/FASTDB ; that is currently the only place to get the code. (There are no plans to make it pip installable or anything like that.) +The FASTDB code can be checked out from https://github.com/LSSTDESC/FASTDB ; that is currently the only place to get the code. (There are no plans to make it pip installable or anything like that.) Submodules @@ -22,7 +22,7 @@ FASTDB uses at least one submodule. These are checked out in the ``extern`` subd That command will check the appropriate commit of all needed submodules. -If later you pull a new revision, ``git status`` may show your submodule as modified, if somebody else has bumped the submodule to a newer verion. In that case, just run:: +If later you pull a new revision, ``git status`` may show your submodule as modified, if somebody else has bumped the submodule to a newer version. In that case, just run:: git submodule update @@ -43,13 +43,13 @@ If you've edited a ``Makefile.am`` file in any directory, or the ``configure.ac` ./configure --with-installdir=[DIR] --with-smtp-server=[SERVER] --with-smpt-port=[PORT] make install -The ``[DIR]`` parameter is the directory where you want to install the code. The SMTP server setup requires you to know what you're doing. (FASTDB uses smtp to send password reset messages.) You can run:: +The ``[DIR]`` parameter is the directory where you want to install the code. The SMTP server setup requires you to know what you're doing. (FASTDB uses SMTP to send password reset messages.) You can run:: ./configure --help as usual with GNU autotools to see what other options are available. If you're making a production install of FASTDB somewhere, you will definitely want to do things like configure the database connection. -It's possible that after running either the ``./configure`` or ``make`` commands, you'll get errors about ``aclocal-1.16 is missing on your system`` or something similar. There are two possibilites; one is that you do legimiately need to rebuild the autotools file, in which case see :ref:`autoreconf-install` below. However, if you haven't touched the files ``aclocal.m4``, ``configure``, or, in any subdirectory, ``Makefile.in`` or ``Makefile.am``, then this error may be result of an unfortunate interaction between autotools and git; autotools (at least some versions) looks at timestamps, but git checkouts do not restore timestamps of files committed to the archive. In this case, you can run:: +It's possible that after running either the ``./configure`` or ``make`` commands, you'll get errors about ``aclocal-1.16 is missing on your system`` or something similar. There are two possibilities; one is that you do legitimately need to rebuild the autotools file, in which case see :ref:`autoreconf-install` below. However, if you haven't touched the files ``aclocal.m4``, ``configure``, or, in any subdirectory, ``Makefile.in`` or ``Makefile.am``, then this error may be result of an unfortunate interaction between autotools and git; autotools (at least some versions) looks at timestamps, but git checkouts do not restore timestamps of files committed to the archive. In this case, you can run:: touch aclocal.m4 configure find . -name Makefile.am -exec touch \{\} \; @@ -63,12 +63,12 @@ and then retry the ``./configure`` and ``make`` commands above. Local Test Environment ======================= -The file ``docker-compose.yaml`` in the top-level directory contains (almost) everything necessary to bring up a test/development FASTDB environment on your local machine. You'll need to have some form of docker installed, with a new enough version of ``docker compose``. Rob is able to get things to work with Docker 20.10.24 (run ``docker --version``) and docker compose 2.36.2 (run ``docker compose version``). If you have older versions and something doesn't work, try upgrading. You'll need to have the docker container runtime going; how that works depends on exactly which docker you install. On a Linux, we rcommend `installing Docker Engline `_. On a Mac, you can also try that, but people have had success with `Docker Desktop `_. +The file ``docker-compose.yaml`` in the top-level directory contains (almost) everything necessary to bring up a test/development FASTDB environment on your local machine. You'll need to have some form of docker installed, with a new enough version of ``docker compose``. Rob is able to get things to work with Docker 20.10.24 (run ``docker --version``) and docker compose 2.36.2 (run ``docker compose version``). If you have older versions and something doesn't work, try upgrading. You'll need to have the docker container runtime going; how that works depends on exactly which docker you install. On a Linux, we recommend `installing Docker Engline `_. On a Mac, you can also try that, but people have had success with `Docker Desktop `_. .. _test-build-docker-images: -Buildng the Docker images -------------------------- +Building the Docker images +-------------------------- You can build all the docker images necessary to create a development/test environment by running the following in the top level directory of your git checkout:: @@ -81,7 +81,7 @@ If all is well, it should tell you that several images were built. Installing for tests -------------------- -Before running all the docker containers, you have to install the code in the location that the containers will be expecting to find it. :ref:`installing-the-code` above describes the general procedure for installing the code. If you want to install the code on your local test enviroment for use with the tests in the docker compose environment, cd into the top level of your ``FASTDB`` checkout and run:: +Before running all the docker containers, you have to install the code in the location that the containers will be expecting to find it. :ref:`installing-the-code` above describes the general procedure for installing the code. If you want to install the code on your local test environment for use with the tests in the docker compose environment, cd into the top level of your ``FASTDB`` checkout and run:: ./configure --with-installdir=$PWD/install \ --with-smtp-server=mailhog \ @@ -174,7 +174,7 @@ That will bring up a shell server you can connect to and work with that will hav Please Don't Docker Push ------------------------ -The `docker-compose.yaml` file will build docker images set up so that they can easily be pushed to Perlmutter's container image registrly. Please do *not* run any docker push commands to push those images, unless you've tagged them differently and know what you're doing. (If you really know what you're doing, you're always allowed to do *anything*.) +The `docker-compose.yaml` file will build docker images set up so that they can easily be pushed to Perlmutter's container image registry. Please do *not* run any docker push commands to push those images, unless you've tagged them differently and know what you're doing. (If you really know what you're doing, you're always allowed to do *anything*.) Working With the Test Installation @@ -215,7 +215,7 @@ However, there may be one more step. If you modified code that the webserver us docker compose up -d webap docker compose logs webap -The last step show not show any errors or tracebacks; if it did, then you broke the code an the webserver can't start. Fix the code, install again, and then do the three steps above again until it works. +The last step show not show any errors or tracebacks; if it did, then you broke the code and the webserver can't start. Fix the code, install again, and then do the three steps above again until it works. .. _autoreconf-install: @@ -246,7 +246,7 @@ in order create the expected test data on your local machine. You only need to Exiting the test environment ---------------------------- -If you're inside the container, you can exit with ``exit`` (just like any other shell). Once outside the container, assuming you're still in the ``tests`` subdirectory, you re-enter the (still-running) test container with another ``docker compose exec -it shell /bin/bash``. If you want to tear down the test enviornment, run:: +If you're inside the container, you can exit with ``exit`` (just like any other shell). Once outside the container, assuming you're still in the ``tests`` subdirectory, you re-enter the (still-running) test container with another ``docker compose exec -it shell /bin/bash``. If you want to tear down the test environment, run:: docker compose down -v @@ -266,7 +266,7 @@ that will run all of the tests and tell you how they're doing. As usually with **WARNING**: it's possible the tests do not currently clean up after themselves (especially if some tests fail), so you may need to restart your environment after running tests before running them again. If you hit ``CTRL-C`` while ``pytest`` is running, tests will almost certainly not have cleaned up after themselves. -What's more, right now, if you're running all of the tests, if an early test fails, it can cause a later test to fail, even though that later test wouldn't actually fail if the earlier tests had passed. This is bad behvaior; if tests properly cleaned up after themselves (which they're supposed to do even if they fail), then the later tests shouldn't fail just because an earlier one does. Until we get this behavior fixed, when looking at lots of tests at once, work on them in order, as the later tests might not "really" have failed. +What's more, right now, if you're running all of the tests, if an early test fails, it can cause a later test to fail, even though that later test wouldn't actually fail if the earlier tests had passed. This is bad behavior; if tests properly cleaned up after themselves (which they're supposed to do even if they fail), then the later tests shouldn't fail just because an earlier one does. Until we get this behavior fixed, when looking at lots of tests at once, work on them in order, as the later tests might not "really" have failed. You can always exit any shells running on containers, and tear down the whole environment with ``docker compose down -v``. That will allow you to start up a new test environment (see :ref:`local-test-env`) and start over with empty databases. @@ -298,9 +298,9 @@ or run:: cd /code/tests RUN_FULL90DAYS=1 pytest -v --trace services/test_sourceimporter.py::test_full90days -Both of these start tests with test fixtures that create a database user and load data into the database. The ``--trace`` command tells pytest to stop at the begining of a test, after the fixture has run. The shell where you run this will dump you into a ``(Pdb)`` prompt. Just leave that shell sitting there. At this point, you have a loaded database. You can look at ``localhost:8080`` in your web browser to see the web ap, and log in with user ``test`` and password ``test_password``. +Both of these start tests with test fixtures that create a database user and load data into the database. The ``--trace`` command tells pytest to stop at the beginning of a test, after the fixture has run. The shell where you run this will dump you into a ``(Pdb)`` prompt. Just leave that shell sitting there. At this point, you have a loaded database. You can look at ``localhost:8080`` in your web browser to see the web ap, and log in with user ``test`` and password ``test_password``. -The ``test_full90days_fast`` test runs a lot faster, loading up the main postgres tables with the test data. It does *not* load anyting into the mongo database. The ``test_full90days`` test takes up to a minute or so to run, because what it's really doing is testing a whole bunch of different servers, an there are built in sleeps so that each step of the test can be sure that other servers have had time to do their stuff. This one loads the full test data set into the "ppdb" tables, and runs a 90 simulated days of alerts through some test brokers. When it's done, the sources from those 90 simulated days will be in the main postgrest ables, and the mongo database will be populated with the test broker messages. (The test brokers aren't doing anything real, but are just assigning random classifications for purposes of testing the plubming.) +The ``test_full90days_fast`` test runs a lot faster, loading up the main postgres tables with the test data. It does *not* load anything into the mongo database. The ``test_full90days`` test takes up to a minute or so to run, because what it's really doing is testing a whole bunch of different servers, an there are built in sleeps so that each step of the test can be sure that other servers have had time to do their stuff. This one loads the full test data set into the "ppdb" tables, and runs a 90 simulated days of alerts through some test brokers. When it's done, the sources from those 90 simulated days will be in the main postgres tables, and the mongo database will be populated with the test broker messages. (The test brokers aren't doing anything real, but are just assigning random classifications for purposes of testing the plumbing.) When you're done futzing around with the web ap, go to the shell where you ran ``pytest ...`` and just press ``c`` and hit Enter at the ``(Pdb)`` prompt. The test will compete, exit, and (ideally) clean up after itself. @@ -325,10 +325,10 @@ TODO Notes and Tips for Development and Testing ========================================== -Running tests on github CI +Running tests on GitHub CI -------------------------- -The tests on github CI require up-to-date docker images. They don't change very often, so usually you don't have to do anything. However, if they have changed, then you need to do edit ``docker-compose.yaml`` and bump the default version of all the images. You'll see that all the images end in ``${DOCKER_VERSION:-test20250815}`` (or some other yyyymmdd). Bump the date to the current date on all the images. Then do:: +The tests on GitHub CI require up-to-date docker images. They don't change very often, so usually you don't have to do anything. However, if they have changed, then you need to do edit ``docker-compose.yaml`` and bump the default version of all the images. You'll see that all the images end in ``${DOCKER_VERSION:-test20250815}`` (or some other yyyymmdd). Bump the date to the current date on all the images. Then do:: yyyymmdd=20250815 # replace this with the yyyymmdd you put in docker-compose.yaml docker compose build @@ -338,13 +338,13 @@ The tests on github CI require up-to-date docker images. They don't change very do docker push ghcr.io/lsstdesc/${i}:test${yyyymmdd} ; \ done -After you've done this, do a ``git push``, or create a pull request, or do whatever it is you normally do that triggers the running of the automated tests on github. +After you've done this, do a ``git push``, or create a pull request, or do whatever it is you normally do that triggers the running of the automated tests on GitHub. Changing database structures ---------------------------- -If you change database sturctures (adding fields, etc.), it's possible that some of the tests will start failing because cached test data no longer matches what's expected. This will happen (at least) to tests that use the ``alerts_90days_sent_received_and_imported`` fixture in ``tests/fixtures/alertcycle.py``. If you're seeing something you think is this error, look at all the comments above and below that test in that file for information on rebuilding the cached test data. +If you change database structures (adding fields, etc.), it's possible that some of the tests will start failing because cached test data no longer matches what's expected. This will happen (at least) to tests that use the ``alerts_90days_sent_received_and_imported`` fixture in ``tests/fixtures/alertcycle.py``. If you're seeing something you think is this error, look at all the comments above and below that test in that file for information on rebuilding the cached test data. If the ``services/test_sourceimporter.py::test_full90days_fast`` test fails --------------------------------------------------------------------------- @@ -362,7 +362,7 @@ TODO Updating Docker Images ---------------------- -Hopefully you don't have to do this. In the rare case where you do (which will be if you've edited anything in the ``docker`` subdirectory), you need to build and push new docker images for the automated tests on github to use. +Hopefully you don't have to do this. In the rare case where you do (which will be if you've edited anything in the ``docker`` subdirectory), you need to build and push new docker images for the automated tests on GitHub to use. First, edit ``docker-compose.yaml`` and find all lines that start with ``image:`` (after several spaces). At the end of that line you should see something like ``${DOCKER_VERSION:-test20250815}``. Bump the date after ``test`` to the current date. Make sure *not* to remove either the colon, or the dash right after the colon. (We're assuming two people won't be doing this on the same day....) Then, at the top level of your archive, run:: @@ -387,9 +387,9 @@ Database Migrations Database migrations are all in the ``db`` subdirectory. They are a series of ``.sql`` files which contain PostgreSQL commands. If you look, you will notice that the files are named by date. This is important, because the migrations in general do not commute; they must always be applied in the same order. -Normally, when you bring up a :ref:`local-test-env`, the database migrations are automatically applied. As such, once the test environment is going, the database already has all the necessarry tables created. +Normally, when you bring up a :ref:`local-test-env`, the database migrations are automatically applied. As such, once the test environment is going, the database already has all the necessary tables created. -On a production system, when updating the code, you may need to apply databse migrations to update your database. This will happen when you update to a new version, and the database schema have changed. In general, it's a good idea to run this every time you update the code for an installed FASTDB instance. **Backup your current database before doing this**, just in case something horrible happens. You apply the migrations by going into an environment where the code is running (e.g. a shell on the productionwebserver) and running:: +On a production system, when updating the code, you may need to apply database migrations to update your database. This will happen when you update to a new version, and the database schema have changed. In general, it's a good idea to run this every time you update the code for an installed FASTDB instance. **Backup your current database before doing this**, just in case something horrible happens. You apply the migrations by going into an environment where the code is running (e.g. a shell on the production webserver) and running:: cd /code/db python apply_migrations.py @@ -410,7 +410,7 @@ Adding new migrations If you need to make changes to the database, you must write a migration for the database. Do this by creating a file in the ``db`` subdirectory whose name is ``yyyy-mm-dd_nnn_text.sql``. In this name, ``nnn`` is just a number; usually this can just be 000 or 001. It's there to preserve the order in case you need to create more than one migration file on the same there. ``text`` can be anything. It should be a very short description of the changes made. Look at the existing files for guidance. Do not put any spaces in ``text``; just use things you'd normally want to use in a Unix filename. (That's a subset of what's legal in a Unix filename....) -When creating the migration, be aware that this needs to be applied to production database. You can't just think about changing the table structure; you also have to think about preserving the data. That means you don't drop a column and add a new column, you have to rename a column. If the table structure is changing alot, the SQL code needed to do the migration while preserving the data could potentially be complicated. (You may need, for instance, to use temporary tables.) +When creating the migration, be aware that this needs to be applied to production database. You can't just think about changing the table structure; you also have to think about preserving the data. That means you don't drop a column and add a new column, you have to rename a column. If the table structure is changing a lot, the SQL code needed to do the migration while preserving the data could potentially be complicated. (You may need, for instance, to use temporary tables.) **WARNING**: Pay attention when merging branches. If two branches have made database migrations, you may need to rename the migration to a later date to keep things in the right order. (Of course, if the migrations are inconsistent, you have to resolve that, but that can happen with any code in any migration.) @@ -437,14 +437,14 @@ The base installation directory is:: /global/cfs/cdirs/lsst/groups/TD/SOFTWARE/fastdb_deployment/rknop_dev -In that directory, make sure there are subdirectories ``install``, ``query_results``, and ``sessions``, in additon to the ``FASTDB`` checkout generated with:: +In that directory, make sure there are subdirectories ``install``, ``query_results``, and ``sessions``, in addition to the ``FASTDB`` checkout generated with:: git clone git@github.com::LSSTDESC/FASTDB cd FASTDB git checkout git submodule update --init -The ``.yaml`` files defining the Spin workloads are in ``admin/spin/rknop_dev`` in the git archive. (Note that, unless I've screwed up (...which has happend...), the files ``secrets.yaml`` and ``webserver-cert.yaml`` will not be complete, because those are the kinds of things you don't want to commit to a public git archive. Edit those files to put in the actual passwords and SSL key/certificates before using them, and **make sure to remove the secret stuff before committing anything to git**. If you screw up, you have to change **all** the secrets.) To install the code to work with those ``.yaml`` files, run:: +The ``.yaml`` files defining the Spin workloads are in ``admin/spin/rknop_dev`` in the git archive. (Note that, unless I've screwed up (...which has happened...), the files ``secrets.yaml`` and ``webserver-cert.yaml`` will not be complete, because those are the kinds of things you don't want to commit to a public git archive. Edit those files to put in the actual passwords and SSL key/certificates before using them, and **make sure to remove the secret stuff before committing anything to git**. If you screw up, you have to change **all** the secrets.) To install the code to work with those ``.yaml`` files, run:: cd /global/cfs/cdirs/lsst/groups/TD/SOFTWARE/fastdb_deployment/rknop_dev/FASTDB touch aclocal.m4 configure @@ -457,4 +457,4 @@ The ``.yaml`` files defining the Spin workloads are in ``admin/spin/rknop_dev`` --with-email-from=raknop@lbl.gov make install -This is necessary because the docker image for the web ap does *not* have the fastdb code baked into it. Rather, it bind mounds the ``install`` directory and uses the code there. (This allows development without having to rebuild the docker image.) +This is necessary because the docker image for the web ap does *not* have the FASTDB code baked into it. Rather, it bind mounds the ``install`` directory and uses the code there. (This allows development without having to rebuild the docker image.) diff --git a/docs/filters.rst b/docs/filters.rst new file mode 100644 index 0000000..27784bb --- /dev/null +++ b/docs/filters.rst @@ -0,0 +1,275 @@ +.. _creating-filters: + +================ +Filters Overview +================ + +This page discusses filters as they are used in the context of the LSST alert stream. Essentially, a filter takes a stream of alerts from a broker, and returns a subset of those alerts based on some scientific criteria. This is useful for narrowing down the vast stream of millions of alerts a day that the Rubin Observatory outputs to something that can be more easily digested and used for specific science cases. For example, a filter could output only objects that look like supernovae, or on objects in a certain area on the sky. + +Some of the requirements for filters include: + +* **reproducible:** they should return the same objects if they were to be run multiple times on the same set of objects +* **broker-level:** filters should be applied at the broker level (i.e. within its pipeline), and create their own stream topic of alerts. If that is not possible, the filter may be in a separate location, but it should *not* be located in the FASTDB repository. +* **provide certain alert data:** as there is no centrally stored database of all LSST alerts, each of the alerts being output from a filter should have all of the data from the `DiaSource `_ and the `DiaObject `_ schema, and all the data from the ``prvDiaSources`` and ``prvDiaForcedSource`` `arrays `_. Ideally, the alerts should have *all* of the original data from the Rubin alert, in addition to any new data that was added by the broker or the filter itself. But at a *minimum*, the following parameters are required in order to get some sense of the alert: + + +.. table:: From the DiaSource schema and from the ``prvDiaSources`` array: + :align: center + + +--------------------+----------------------------------------------------------+ + | parameter | description | + +====================+==========================================================+ + | ``diaSourceId`` | unique identifier for the source | + +--------------------+----------------------------------------------------------+ + | ``diaObjectId`` | id of the object this source was associated with, if any | + +--------------------+----------------------------------------------------------+ + | ``midpointMjdTai`` | Modified Julian Date of visit | + +--------------------+----------------------------------------------------------+ + | ``apFlux`` | flux in nJy | + +--------------------+----------------------------------------------------------+ + | ``apFluxErr`` | estimated flux uncertainty in nJy | + +--------------------+----------------------------------------------------------+ + | ``visit`` | id of the visit where the source was measured | + +--------------------+----------------------------------------------------------+ + | ``ra`` | Right ascension of the center of this source (deg) | + +--------------------+----------------------------------------------------------+ + | ``dec`` | Declination coordinate of the center of the source (deg) | + +--------------------+----------------------------------------------------------+ + + + +.. table:: From the DiaObject schema: + :align: center + + +-----------------+--------------------------------------------------------------------------+ + | parameter | description | + +=================+==========================================================================+ + | ``diaObjectId`` | id of the object this source was associated with, if any | + +-----------------+--------------------------------------------------------------------------+ + | ``ra`` | Right ascension of the center of this source (deg) | + +-----------------+--------------------------------------------------------------------------+ + | ``dec`` | Declination coordinate of the center of the source (deg) | + +-----------------+--------------------------------------------------------------------------+ + | ``raErr`` | Uncertainty of ra *(Can be omitted if absolutely necessary)* | + +-----------------+--------------------------------------------------------------------------+ + | ``decErr`` | Uncertainty of dec *(Can be omitted if absolutely necessary)* | + +-----------------+--------------------------------------------------------------------------+ + | ``ra_dec_cov`` | Covariance between ra and dec *(Can be omitted if absolutely necessary)* | + +-----------------+--------------------------------------------------------------------------+ + + +.. table:: From the ``prvDiaForcedSources`` array (see `LSST alert packet schema `_): + :align: center + + +-----------------------+----------------------------------------------------------+ + | parameter | description | + +=======================+==========================================================+ + | ``diaForcedSourceId`` | unique identifier for the source | + +-----------------------+----------------------------------------------------------+ + | ``diaObjectId`` | id of the object this source was associated with, if any | + +-----------------------+----------------------------------------------------------+ + | ``midpointMjdTai`` | Modified Julian Date of visit | + +-----------------------+----------------------------------------------------------+ + | ``psfFlux`` | flux in nJy | + +-----------------------+----------------------------------------------------------+ + | ``psfFluxErr`` | estimated flux uncertainty in nJy | + +-----------------------+----------------------------------------------------------+ + | ``visit`` | id of the visit where the source was measured | + +-----------------------+----------------------------------------------------------+ + | ``ra`` | Right ascension of the center of this source (deg) | + +-----------------------+----------------------------------------------------------+ + | ``dec`` | Declination coordinate of the center of the source (deg) | + +-----------------------+----------------------------------------------------------+ + + + + + + +Creating new filters +==================== + + +This section details how to create new filters at the broker level for FASTDB to subscribe to, for all of the LSST brokers where that is available. Once you have created your filter, let Rob know the broker and the topic name to get FASTDB subscribed to it. + +**NOTE:** Much of the broker code is still in progress (as of the writing of this), so make sure to check the linked tutorials for possible changes if you run into any difficulties. + + +ALeRCE +------ + +**Current status as of April 2026:** no immediate way to create new filters at the broker level. We think filtering would be handled through the 'step' mechanism, but this is unclear. + +ALeRCE is a Kafka-based broker that provides Kafka topic streams that users can subscribe to via a variety of methods. They also have an API interface, a Python client, and a web-based explorer that allow you to access the last 48 hours of data on demand. + + +Useful Links: +^^^^^^^^^^^^^ +* `ALeRCE `_ +* `Creating a step `_ + + +AMPEL +----- + +**Current status as of April 2026:** have to contact the broker maintainers in order to implement filters. At the moment it looks like filters are implemented in 'Tier 0', but FASTDB might want to have an option to have filters implemented in an additional post-existing-pipeline stage (unless you can implement a filter in Tier 0 and also get all the preprocessing info) + +Useful Links: +^^^^^^^^^^^^^ +* `AMPEL Github `_ +* `AMPEL Documentation `_ + + +ANTARES +------- + +The ANTARES broker runs an algorithm on its alerts that associates the alert with the nearest point of known past measurements, called a Locus. This is the object they use instead of the Alert object within the filters and send out via stream. They also filter out poor quality and bogus alerts, associate gravitational wave events, and look up associated objects. Finally, they apply the existing filters to the Locus object. The messages in the stream are the Locus objects, which have all `Locus properties `_, as well as the alert and all past alerts associated with this object. + +Useful Links: +^^^^^^^^^^^^^ +* `Filter creation tutorial notebook `_ +* `Existing ANTARES filters `_ + +Steps to create a new LSST filter for ANTARES: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. Create a `GitLab `_ account if you don't already have one. You can use your GitHub account to create your GitLab account. +2. Fork and clone https://gitlab.com/nsf-noirlab/csdc/antares/devkit +3. Pip install the package in editable mode: ``pip install -e .`` +4. Create a new folder for your filter in ``/antares_devkit/filters/``, and create an ``__init__.py`` file where all your code will go. +5. Create your filter class. It should be a class based on the ``BaseFilter`` class, and should at a minimum have a ``_run(self,locus)`` method, which is where the filter logic should go. It should run ``locus.tag('[tag_name]')`` on the loci that have been chosen by your filter code, where ``[tag_name]`` is the name you want for the stream topic that your filter creates. Start with this `filter template `_ and work from there (the Slack channel/id is optional, you should not need to worry about that). See `The Locus Object `_ for a description of what the 'locus' is and a reference for some of its methods and properties. There is a `full list of the locus and alert object properties `_ for reference as well. Take a look at the `uniform_random_sample `_ filter for an example implementation. +6. Install the ANTARES client in order to get sample data for your tests by running: ``pip install antares-client`` +7. Test out your filter. You can use the sample code below using the ``antares-client`` ``search.get_random_loci(n)`` function. You can also make use of the `other existing search functions `_ to get your test data, for example ``search.cone_search()`` which searches for loci in a certain region. For more detail, take a look at the ANTARES tutorial notebook section on `testing your filter `_. + +.. code:: python + + from antares_client import search + from antares_devkit.models import DevKitLocus + from antares_devkit.utils import filter_report + + + # Execute your_filter_class filter on 10 random loci + for client_locus in search.get_random_loci(10): + devkit_locus = DevKitLocus.model_validate(client_locus.to_devkit()) + report = filter_report([your_filter_class], devkit_locus) + + # `filter_report()` returns a report of what the filter did. Take a look at it: + print(report) + +8. Once you have successfully run your test, create a pull request of your forked repository. Use the following template to write out your pull request: + +.. code:: markdown + + ### Summary + Provide a brief summary of the changes introduced in this Merge Request. + + ### Changes Added + - List the ky changes included in this MR. + - Explain why these changes are necessary. + + ### New Filter Information (if applicable) + - **What does the filter do?** Describe its purpose and functionality + - **Any dependencies or configuration required?** List any additional setup needed. + + ### Testing + - Describe how the changes were tested + - Provide any code you used to test the filter + - Provide any test cases or steps to verify functionality (optional) + + ### Additional Notes + + +9. Once your filter pull request has been approved and merged, send the topic name and broker to Rob. + +Babamul +------- + +**Current status as of April 2026:** no immediate way to create filters on Babamul. You need an account to access some of their API and their Kafka documentation, and to use their Python client to consume alerts. There is some API documentation and minimal client documentation. + + +Babamul is a Kafka-based broker, written in Rust. It seems to have a specific set of `filter 'workers' `_, which is likely where new filters would be added in. + +Useful Links: +^^^^^^^^^^^^^ +* `Babamul `_ +* `Babamul client documentation `_ +* `Babamul streaming examples `_ + +Fink +---- + +The Fink broker is Kafka based. It streams alert data that has been enriched, for example with data from other catalogues and machine learning classification scores. + +Useful Links: +^^^^^^^^^^^^^ +* `Creating a new Fink filter `_ +* `Existing Fink filters `_ + +Steps to create a new LSST filter for Fink: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. Fork and clone https://github.com/astrolabsoftware/fink-filters.git +2. Make a new folder in ``/fink_filters/rubin/livestream`` called ``filter_[name]``, where you replace ``[name]`` with the name of your filter. Make sure that your filter name doesn't already exist by taking a look at the other filters that already exist. +3. Create empty ``filter.py`` and ``__init__.py`` files in that folder. +4. Create a function in ``filter.py`` that performs the filtering. See `filter_uniform_sample `_ for a simple example filter. + + * The inputs should be existing data defined in the LSST Alert or Fink-specific properties (you can find a list of them on the `Schemas page `_, under the **Data Transfer, Livestream & Xmatch** heading.), and these should be the quantities you want to filter on. + * The output should be a ``pd.Series`` of Booleans that is True for your chosen Alerts and False otherwise. +5. Create the test by copying the following code into your ``filter.py`` file: + +.. code-block:: python + + if __name__ == "__main__": + from fink_filters.tester import spark_unit_tests + + globs = globals() + spark_unit_tests(globs, load_rubin_df=True) + + +This will load in the test dataset in ``datatest/rubin_test_data_10_0.parquet`` and use that to test your filter. If this dataset doesn't contain representative data for your test, you can `download your own data from the Fink Data Transfer service `_ and add it to the test. This will require you to install ``fink-client`` and email them to get access. + +6. Set up the development environment by pulling the docker image and running it (see `how to get docker `_ if you don't already have it): + +.. code-block:: bash + + # 2.3GB compressed + docker pull gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest + + # Assuming you are in /path/to/fink-filters + docker run -t -i --rm -v \ + $PWD:/home/libs/fink-filters \ + gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-rubin:latest bash + +7. Once in the docker container, you can run the test on your filter to make sure that it works using the following command: ``./run_tests.sh --single_module fink_filters/rubin/livestream/filter_[name]/filter.py``. If it works, there will be some Spark UserWarnings, and then it will generate a coverage report if there are no errors in the tests. +8. If the filter test is working, then you can create a pull request for the ``fink-filters`` repository with your new filter. +9. Once your filter pull request has been approved and merged, send the topic name and broker to Rob. + + +Lasair +------ + +**Current status as of April 2026:** can make filters using their online builder, using an SQL-style query. To convert this to an active filter, you need a Lasair account. This filter will then output a Kafka topic that you can subscribe to. There is an option to send only the fields that you have filtered on, or the whole alert (without the cutout images). + +Useful Links: +^^^^^^^^^^^^^ +* `Lasair `_ +* `Making a Lasair filter `_ + + +Pitt-Google +----------- + +**Current status as of April 2026:** + +Pitt-Google operates a differently than the other brokers, running on Google Cloud's Pub/Sub service instead of Kafka. This means that unlike other brokers, where Python is used to create filters that build upon a Kafka package, Pitt-Google filters use the Pub/Sub-native JavaScript. As of yet it is unclear whether these filters will need to be upstreamed to Pitt-Google who will create a new Pub/Sub topic for FASTDB to listen to; or whether there will be some other middleman "broker" which listens to the un-filtered Pitt-Google stream and re-broadcasts a set of filtered topics, and FASTDB will poll from there. + +At present there are only a few attributes which can easily be filtered on, which are best accessed by downloading a test alert from Pitt-Google with their Python client, and viewing the ``downloaded_alert.msg.attributes`` dictionary. + +The Pitt-Google and Google Pub/Sub documentation both discuss string-based attribute filters, however, given the limited options available within that method of filtering, and the expected desire for more complex filters, the JavaScript UDF method should be used. + +Links: +^^^^^^ +* `Pitt-Google tutorial on pulling and filtering alerts `_ +* `Pitt-Google client documentation `_ +* `Pitt-Google broker documentation `_ + diff --git a/docs/index.rst b/docs/index.rst index b8e291f..ec6018b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -10,6 +10,7 @@ FASTDB — DESC's time-domain database overview usage developers + filters Indices and tables diff --git a/docs/overview.rst b/docs/overview.rst index 1cc29ec..aa4b024 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -5,9 +5,9 @@ FASTDB Overview =============== -FASTDB runs with two database backends, a PostgreSQL server and a Mongodb server. Neither database server is directly accessible; rather, you access FASTDB through a webserver. As of this writing, a few instances of FASTDB exist; not all of them are running the latest verson of the code.... +FASTDB runs with two database backends, a PostgreSQL server and a MongoDB server. Neither database server is directly accessible; rather, you access FASTDB through a webserver. As of this writing, a few instances of FASTDB exist; not all of them are running the latest version of the code.... -* ``https://desc-fastb.lbl.gov`` is the prouction instance. As of this writing, it has alerts from two filters from Fink: the first is a Fink-defined filter of extragalactic transient candidates near known host galaxies, the second is a filter with ~1% of all of the first 9M LSST alerts. (Technically, it's those with ``diaSourceId%113==0``, but that should effectively be a random sample.) It's listening on the Fink "near host" channel, and on a Pitt-Google channel that will get all LSST alerts going forward. As time goes by, it will listen to other brokers and with other filters. +* ``https://desc-fastdb.lbl.gov`` is the production instance. As of this writing, it has alerts from two filters from Fink: the first is a Fink-defined filter of extragalactic transient candidates near known host galaxies, the second is a filter with ~1% of all of the first 9M LSST alerts. (Technically, it's those with ``diaSourceId%113==0``, but that should effectively be a random sample.) It's listening on the Fink "near host" channel, and on a Pitt-Google channel that will get all LSST alerts going forward. As time goes by, it will listen to other brokers and with other filters. * ``https://fastdb-dp1.lbl.gov`` has the differential imaging catalogs (diaobjects, diasources, diaforcedsources) from DP1 loaded into it. If you want an account on this, talk to Rob. It has an old version of FASTDB and the documentation is wrong for it. @@ -27,7 +27,7 @@ For developers wanting to set up a test installation on their own machine, see : Database Tables Overview ======================== -The core tables of the database are ``diaobject``, ``diasource``, and ``diaforcedsource``. These nomenclature follow LSST terminology (or, at least, an earlier verson of LSST terminology). A ``diaobject`` is a single transient or varaible object somewhere on the sky; it's a supernova, or a quasar, or something like that. (``dia`` = "differential imaging analysis.) A ``diasource`` is a single detection of a diaobject. When the LSST project does difference imaging, they will scan the difference images for detections. When they find one, they create a ``diasource``. They then look to see if there is already a ``diaobject`` at the position on the sky of this ``diasource``; if so, this ``diasource`` is associated with that ``diaobject``. If not, a new ``diaobject`` is created, and the ``diasource`` is associated with that new ``diaobject``. Finally, a ``diaforcedsource`` represents forced photometry, where an aperture (or PSF model) is placed down at a predetermined known position of a ``diaobject`` on a single image. +The core tables of the database are ``diaobject``, ``diasource``, and ``diaforcedsource``. These nomenclature follow LSST terminology (or, at least, an earlier version of LSST terminology). A ``diaobject`` is a single transient or variable object somewhere on the sky; it's a supernova, or a quasar, or something like that. (``dia`` = "differential imaging analysis.) A ``diasource`` is a single detection of a diaobject. When the LSST project does difference imaging, they will scan the difference images for detections. When they find one, they create a ``diasource``. They then look to see if there is already a ``diaobject`` at the position on the sky of this ``diasource``; if so, this ``diasource`` is associated with that ``diaobject``. If not, a new ``diaobject`` is created, and the ``diasource`` is associated with that new ``diaobject``. Finally, a ``diaforcedsource`` represents forced photometry, where an aperture (or PSF model) is placed down at a predetermined known position of a ``diaobject`` on a single image. There will be multiple ``diaobject`` entries for the same physical object on the sky, because each LSST release will assign new ``diaobjectid`` values. (Indeed, some of the bits of the 64-bit integer ``diabojectid`` value encode the release.) What's more, empirically, in the alert stream there are more than one ``diaobjectid`` within 1" of each other, and sometimes the same ``diasource`` is associated with different ``diaobjectid``. (This led to some serious refactoring work in the database.) As such, FASTDB also has a ``root_diaobject`` table indexed by a UUID. Ideally, one physical object on the sky will only ever have one ``root_diaobject``. The ``diaobject`` table has a ``rootid`` field that points back to the ``root_diaobject`` table. We strongly recommend that when trying to refer to individual objects in FASTDB, you use ``rootid``, because it is more approximately unique than ``diaobjectid`` is. @@ -65,11 +65,11 @@ When searching the database for a lightcurve, externally you specify a processin Processing Versions for Data Uploaders and Developers ----------------------------------------------------- -The database defines the concepts of *processing version* and *base processing version*. The *processing version* is what interfaces to the outside world; it's what users will specify when calling the various web APIs. The *base processing version* is what each row in one of the photometry tables is associated with. (So, the ``diaobject``, ``diasource``, and ``diaforcedsource`` tables (at least) all have a ``base_procver_id`` column, which is a foreign key into the ``base_processing_version`` table.) Finally, there is a table ``base_procver_of_procver`` that holds a prioritized list of base procesising versions that go with each processing version for each table. +The database defines the concepts of *processing version* and *base processing version*. The *processing version* is what interfaces to the outside world; it's what users will specify when calling the various web APIs. The *base processing version* is what each row in one of the photometry tables is associated with. (So, the ``diaobject``, ``diasource``, and ``diaforcedsource`` tables (at least) all have a ``base_procver_id`` column, which is a foreign key into the ``base_processing_version`` table.) Finally, there is a table ``base_procver_of_procver`` that holds a prioritized list of base processing versions that go with each processing version for each table. In the web API, database queries take this processing version, and figure out which base processing versions go with it. It will then pull photometry from the database, ensuring that a given ``(rootid,visit)`` combination only shows up once in the lightcurve. (That is, the returned lightcurve will not include redundant photometry from the multiple different versions that are stored in the database.) It's possible that there may be multiple base processing versions associated with a single processing version. For example, suppose that DESC uploads a set of SMP photometry and wants this to be processing version ``pv_smp1``. The first time it's uploaded, we create a base processing version ``bpv_smp1`` and a processing version ``pv_smp1``. (One entry in each of two different tables.) Later, we realize we have to redo 5% of the photometry. Rather than delete the old photometry (which would be bad if we ever decided we want to reproduce something), we would upload the replacement photometry for just those 5% of lightcurves with base processing version ``bpv_smp1a``. We would then set ``pv_smp1`` to be associated with base processing versions ``(bpv_smp1a, bpv_smp1)``. This is a priority-ordered list. When pulling lightcurves from the database, the queries need to pull the photometry with base processing version ``bpv_smp1a`` where it exists, and ``bpv_smp1`` where there is no corresponding ``bpv_smp1a`` photometry. -As you can imagine, this leads to rather subtle and complicated database queries. It's not a simple matter of pulling all the values from the ``diaforcedsource`` table for a given set of ``diaobjectid`` values and a given processing version. Rather, the query will need to join to the table that tracks which base processing versions go with which processing versions, use the necesary subqueries to make sure photometry is not duplicated, and ensure that the highest priority base processing version is extracted for each point. Because it's easy for users to look at the table schema and come up with "obvious" queries that do the wrong thing, and because the right queries are potentially error prone (and, even if you manage to do it right, hard to write efficiently), we avoid having users make direct SQL queriers to the database. Rather, we provide web APIs where the user need only specify the processing version, and the complicated business of sorting through base processing versions is handled behind the scenes for them. +As you can imagine, this leads to rather subtle and complicated database queries. It's not a simple matter of pulling all the values from the ``diaforcedsource`` table for a given set of ``diaobjectid`` values and a given processing version. Rather, the query will need to join to the table that tracks which base processing versions go with which processing versions, use the necessary subqueries to make sure photometry is not duplicated, and ensure that the highest priority base processing version is extracted for each point. Because it's easy for users to look at the table schema and come up with "obvious" queries that do the wrong thing, and because the right queries are potentially error prone (and, even if you manage to do it right, hard to write efficiently), we avoid having users make direct SQL queriers to the database. Rather, we provide web APIs where the user need only specify the processing version, and the complicated business of sorting through base processing versions is handled behind the scenes for them. -Note that the base processing version of ``diaobject`` is a bit complicated. To first order, you should just ignore the processing version of ``diaobject``. If you select a base processing version of ``diasource`` or ``diaforcedsource``, those rows will link back to the *right* ``diaobject``, but it's entirely possible that that ``diaobject`` will be in a different processing version than the photometry. Again, consider the example of DESC doing SMP photometry. They will do it for existing diaobjects from an exsiting processing version, but the photometry points themselves will be uploaded under a new processing version. Ideally, the API does this all right, but you can shoot yourself in the foot by specifing some options to the API. +Note that the base processing version of ``diaobject`` is a bit complicated. To first order, you should just ignore the processing version of ``diaobject``. If you select a base processing version of ``diasource`` or ``diaforcedsource``, those rows will link back to the *right* ``diaobject``, but it's entirely possible that that ``diaobject`` will be in a different processing version than the photometry. Again, consider the example of DESC doing SMP photometry. They will do it for existing diaobjects from an existing processing version, but the photometry points themselves will be uploaded under a new processing version. Ideally, the API does this all right, but you can shoot yourself in the foot by specifying some options to the API. diff --git a/docs/usage.rst b/docs/usage.rst index 2380ee1..596d204 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -17,10 +17,10 @@ The FASTDB Client While you can access the FASTDB web API using any standard way of accessing web APIs (e.g. the python ``requests`` module), there is a FASTDB client designed to make this a little bit easier. -Getting Set Up to Use the FASDTB Client +Getting Set Up to Use the FASTDB Client ---------------------------------------- -The FASDTB client is entirely contained in the file ``client/fastdb_client.py`` in the github checkout. You can just refer to this directly in your checkout by adding something to your `PYTHONPATH`, or you can copy it somewhere. (**Warning**: if you copy it somewhere, then be aware that eventually stuff might break as your copied version falls out of date!) +The FASTDB client is entirely contained in the file ``client/fastdb_client.py`` in the GitHub checkout. You can just refer to this directly in your checkout by adding something to your `PYTHONPATH`, or you can copy it somewhere. (**Warning**: if you copy it somewhere, then be aware that eventually stuff might break as your copied version falls out of date!) The `fastdb_client.py` requires some python modules that are always installed in various environments. The specific packages required that may not be included in base python installs are: @@ -53,7 +53,7 @@ Having done that, thereafter in order to use FASTDB from Perlmutter, each time y source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh - * Add the ``fastdb_client`` diretory to your python path with:: + * Add the ``fastdb_client`` directory to your python path with:: export PYTHONPATH=/dvs_ro/cfs/cdirs/desc-td/SOFTWARE/fastdb_deployment/fastdb_client:$PYTHONPATH @@ -98,9 +98,9 @@ Top-Level Endpoints .. _webap-getprocvers: ``/getprocvers`` -*************** +**************** -Returns a list of known procesing versions and processing version aliases. You get back a JSON-encoded dictionary with keys: +Returns a list of known processing versions and processing version aliases. You get back a JSON-encoded dictionary with keys: * ``status``: string, value ``ok`` * ``procvers`` : list of string; the processing version names. @@ -110,7 +110,7 @@ Because ``procvers`` includes both aliases and processing version names, some of .. _webap-procver: ``/procver`` -*********** +************ Hit this API endpoint with ``/procver/``, where ```` is either the name or the UUID (as a string) of the processing version you want information about. For example, on the production FASTDB, try:: @@ -124,12 +124,12 @@ You will get back a JSON dictionary with keys: * ``aliases`` : list of string, aliases of this processing version * ``base_procvers`` : a dictionary of lists of lists. The dictionary keys are table names. The dictionary values are lists; the list elements are each themselves 2-element lists, [base processing_version, priority], sorted from high to low. -You can pass either a processing version or a processing version alias in ````. If the name you pass is actually an alias, the ``description`` in the result is the name of the processing version itself, not the alias. You will see the alias you passed in the ``aliases`` array. (It turns out you could also pass the UUID of the processing version, but that will usually not be the most conveient for users.) +You can pass either a processing version or a processing version alias in ````. If the name you pass is actually an alias, the ``description`` in the result is the name of the processing version itself, not the alias. You will see the alias you passed in the ``aliases`` array. (It turns out you could also pass the UUID of the processing version, but that will usually not be the most convenient for users.) .. _webap-baseprocver: ``/baseprocver`` -*************** +**************** Hit this API endpoint with ``/baseprocver//``, where ```` is either the name or the UUID (as a string) of the base processing version you want information about; if this is a UUID, then you should omit the ``/
``, but if it's a string, then ``
`` should be one of ``diaobject``, ``diasource``, or ``diaforcedsource``. You will get back a JSON dictionary with keys: @@ -150,7 +150,7 @@ Use this API endpoint to count how many objects, sources, or forced sources ther * ``/count/`` * ``/count//`` -In both of these ```` is one of ``diaobject``, ``diasource``, or ``diaforcedsource``; it indicates the table whose rows you want to convert. ```` is the name or string UUID of the processing version you want to count rows for. If you omit it, it will use ``default`` as the procesing version. (Note: as of this writing, the ``default`` processing version is not actually defined in the production FASTDB; the only one defined is ``realtime``.) +In both of these ```` is one of ``diaobject``, ``diasource``, or ``diaforcedsource``; it indicates the table whose rows you want to convert. ```` is the name or string UUID of the processing version you want to count rows for. If you omit it, it will use ``default`` as the processing version. (Note: as of this writing, the ``default`` processing version is not actually defined in the production FASTDB; the only one defined is ``realtime``.) You will get back a JSON dictionary with keys: @@ -172,7 +172,7 @@ Call this with one of: * ``/getdiaobjectinfo/objid`` * ``/getdiaobjectinfo/procver/objid`` -Where procver is the prcoessing version; it can either be the database's UUID, or the human-readable processing version, or an alias for the processing version. If not given, it assumes "default". ``objid`` is either the rootid, or the diaobjectid, of the object you want information for. +Where procver is the processing version; it can either be the database's UUID, or the human-readable processing version, or an alias for the processing version. If not given, it assumes "default". ``objid`` is either the rootid, or the diaobjectid, of the object you want information for. You can include in the ``json=`` dictionary a single parameter, ``columns``, which is a list of the columns you want back. (The query *may* be slightly faster if you don't ask for any position information, but realistically it should be pretty fast in either case.) @@ -185,26 +185,26 @@ You get back a dictionary. Each key of the dictionary is a string, the name of * ``ra`` : decimal degrees, the position of the object. This might not actually be the best position estimate for the object; it's *probably* something like the position of the first diasource that was detected for the object. You can get better positions by getting a lightcurve for the object, and then either doing a weighted average of diaobject positions yourself, or taking the one one that FASTDB can give you. * ``dec`` : decimal degrees, the position of the object. * ``raerr`` : uncertainty on ra, as reported by LSST - * ``decerr`` : undertainty on dec, as reported by LSST + * ``decerr`` : uncertainty on dec, as reported by LSST * ``ra_dec_cov`` : covariance between ra and dec, as reported by LSST .. _webap-objectsearch: ``/objectsearch`` -**************** +***************** WARNING : this query right now can be very slow, the web proxy may time out. Rob will work on this. MORE IMPORTANT WARNING : this endpoint is currently broken. -Find objects according to criteria. Hit this API endpoint with either just ``/objectsearch`` or with ``/objectsearch/``. In the latter case, ```` is either the name or the UUID (as a string) of the processing version you want to search. In the former case, it will search the ``default`` processing verson. +Find objects according to criteria. Hit this API endpoint with either just ``/objectsearch`` or with ``/objectsearch/``. In the latter case, ```` is either the name or the UUID (as a string) of the processing version you want to search. In the former case, it will search the ``default`` processing version. Search criteria are passed as a JSON-encoded dictionary in the body of the POST. Keywords that may be included are: * ``object_processing_version`` : Use this with great care, because it's complicated and confusing; probably you want to omit it. However, if you know what you're doing, it's possible you'll make the search faster by including the right thing here. -* ``position_processing_Version`` : ...even more complciated and confusing than ``object_processing_version``. Unless you know what you're doing, don't specify this option. +* ``position_processing_Version`` : ...even more complicated and confusing than ``object_processing_version``. Unless you know what you're doing, don't specify this option. * ``fall_back_to_root_position`` : bool, default True; documentation TBD @@ -214,7 +214,7 @@ Search criteria are passed as a JSON-encoded dictionary in the body of the POST. * ``noforced`` : bool. Normally, you will get back the last forced photometry point for each object (see below). If ``noforced`` is True, then you will not get that back. This can make the search faster. Ignored if either ``min_lastmag`` or ``max_lastmag`` are True. -* ``mjd_now`` : float. Normally, the search will look through all photometry when trying to find objects that match your specified criteria. If you pass a value here, it will only look at photometry taken at this MJD or earlier. Use this for tests and simuilations when you want to pretend that the current date is different from the real current date. +* ``mjd_now`` : float. Normally, the search will look through all photometry when trying to find objects that match your specified criteria. If you pass a value here, it will only look at photometry taken at this MJD or earlier. Use this for tests and simulations when you want to pretend that the current date is different from the real current date. * ``ra``, ``dec`` : floats. The RA and Dec, in decimal degrees, for the center of a cone search. If you pass these, both are required, and ``radius`` is also required. @@ -242,7 +242,7 @@ Search criteria are passed as a JSON-encoded dictionary in the body of the POST. * ``mint_maxdetection`` : float. The brightest detection must be on or after this MJD. -* ``maxt_maxdetection`` : float. The brightest detecton must be on or before this MJD. +* ``maxt_maxdetection`` : float. The brightest detection must be on or before this MJD. * ``minmag_maxdetection`` : float. The brightest detection must be no brighter than this. This is often the one you will want to use to throw out too-bright objects. @@ -252,15 +252,15 @@ Search criteria are passed as a JSON-encoded dictionary in the body of the POST. * ``mindt_firstlastdetection`` : float. The time between the first and last *detections* must be at least this many days. -* ``maxdt_firstlastdetection`` : float. The time between the first and last *detections* m ust be at most this many days. Be careful with this. If you're trying to find stuff whose lightcurve only lasts a week, and a cosmic ray hit the objects' host galaxy a year later, and somehow that cosmic ray didn't get properly filtered out, then the ``dt`` between the first and last detections will be a year. +* ``maxdt_firstlastdetection`` : float. The time between the first and last *detections* must be at most this many days. Be careful with this. If you're trying to find stuff whose lightcurve only lasts a week, and a cosmic ray hit the objects' host galaxy a year later, and somehow that cosmic ray didn't get properly filtered out, then the ``dt`` between the first and last detections will be a year. * ``min_lastmag`` : The most recent photometric measurement (including both detections and forced photometry ) must be no brighter than this. * ``max_lastmag`` : The most recent photometry measurement must be no dimmer than this. -* ``statbands`` : list of string. Normally, all of the cuts based on detection dates, detection counts, magnitudes, etc., consider all bands equally. If you only want to consider some bands, list those here. For instance, if you're only interested in cutting on measurements of the g, r, and i bands, pass ``['g', 'r', 'i']`` here. This parameter also affects what is inclued in the returned data; it will ignore any measurements of bands that aren't in this list. +* ``statbands`` : list of string. Normally, all of the cuts based on detection dates, detection counts, magnitudes, etc., consider all bands equally. If you only want to consider some bands, list those here. For instance, if you're only interested in cutting on measurements of the g, r, and i bands, pass ``['g', 'r', 'i']`` here. This parameter also affects what is included in the returned data; it will ignore any measurements of bands that aren't in this list. -You get back a dictionary-encoded table of data. Each key of the dictionary is a column in the table, and each value is a list of values in that column. The columns are as follows. (Note first, last, max detections all implicilty include "within ``statbands``" if that parmeters was passed.) "Detections" below are from the ``diasource`` able. It's possible that the brightest point on the lightcurve isn't a "detection", because for whatever reason it didn't end up in the list of detections by LSST differential imaging. +You get back a dictionary-encoded table of data. Each key of the dictionary is a column in the table, and each value is a list of values in that column. The columns are as follows. (Note first, last, max detections all implicitly include "within ``statbands``" if that parameters was passed.) "Detections" below are from the ``diasource`` able. It's possible that the brightest point on the lightcurve isn't a "detection", because for whatever reason it didn't end up in the list of detections by LSST differential imaging. * ``diaobjectid`` : Object ID * ``ra`` : RA in decimal degrees @@ -276,7 +276,7 @@ You get back a dictionary-encoded table of data. Each key of the dictionary is * ``lastdetflux`` : flux (nJy) of last detection * ``lastdetfluxerr`` : uncertainty on ``lastdetflux`` * ``maxdetmjd`` : MJD of brightest detection -* ``maxdetband`` : Band of brighest detection +* ``maxdetband`` : Band of brightest detection * ``maxdetflux`` : flux (nJy) of brightest detection * ``maxdetfluxerr`` : uncertainty on ``maxdetflux`` * ``lastforcedmjd`` : MJD of the latest forced-photometry measurement @@ -313,7 +313,7 @@ In addition, there are several optional keys that control what's included: * ``which`` : a string, one of "detections", "forced", or "patch". If "detections", you will only get back ``diasource`` information (i.e. things that passed detection cuts on a difference image). If "forced", you get back only forced photometry. If "patch", you get back forced photometry where it's available, with detections filled in where forced photometry is not available (see below). The default is ``patch``, which is often not what you want (but often is). -* ``mjd_now`` : float. Normally, you will get back all relevant photometry. For normal usage, that means photomtery from before the current time, because the future hasen't happened yet. If you specify this value, you only get back photometry from this MJD or earlier. Use this during tests and simulations. +* ``mjd_now`` : float. Normally, you will get back all relevant photometry. For normal usage, that means photometry from before the current time, because the future hasn't happened yet. If you specify this value, you only get back photometry from this MJD or earlier. Use this during tests and simulations. * ``return_object_info`` : See below @@ -337,44 +337,44 @@ What you get depends on whether you included ``return_object_info``. If you inc * ``mjd`` : float, the MJD of this point on the lightcurve * ``diasourceid`` : int, the diaSourceId, or null if this was not a detection * [ ``diaforcedsourceid`` : bigint, the diaForcedSourceId, or null if FASTDB does not have forced photometry for this object at this epoch. Only included you didn't pass ``which`` as ``detections``. ] - * ``source_diaobjectid`` : int, the diaObjectId that was associated with this diaSource, or null if this was not a detection. It *is* possible that this will be different for different rows for the *same* diaobject, because at least the alert stream, LSST sometimes identifies more than one ``diaobjectid`` for the same actual physical transient or variable. What' smore, the ``diaobjectid`` associated with a given ``diasource`` can *change* in different alerts froM LSST. So, treat this with care; within FASTDB, the ``rootid`` is what you want to use. However, if you want to compare objects to things that are reported elsewhere, you will probably need to use the LSST ``diaobjectid`` values; in that case, use, at least, the full collection of ``diaobjectid`` values identified for a single object's lightcurve. + * ``source_diaobjectid`` : int, the diaObjectId that was associated with this diaSource, or null if this was not a detection. It *is* possible that this will be different for different rows for the *same* diaobject, because at least the alert stream, LSST sometimes identifies more than one ``diaobjectid`` for the same actual physical transient or variable. What's more, the ``diaobjectid`` associated with a given ``diasource`` can *change* in different alerts froM LSST. So, treat this with care; within FASTDB, the ``rootid`` is what you want to use. However, if you want to compare objects to things that are reported elsewhere, you will probably need to use the LSST ``diaobjectid`` values; in that case, use, at least, the full collection of ``diaobjectid`` values identified for a single object's lightcurve. * [ ``forced_diaobjectid`` : int, the diaObjectId that was associated with this diaForcedSource, or null if this was not a detection. All the same caveats apply as for ``source_diaobjectid``. This column is not included if ``which`` was ``detections``. ] * ``visit`` : int, the visit number (as defined by LSST) for this observation * ``band`` : the filter/band of the this point. - * ``flux`` : float, the flux in nJy of this point on the lightcruve + * ``flux`` : float, the flux in nJy of this point on the lightcurve * ``fluxerr`` : float, the uncertainty on flux * ``isdet`` : int: 1 if this was detected (i.e. a diaSource exists), 0 if not - * [ ``ispatch`` : see below ; only inclucded you passed ``which`` as ``patch`` (the default) ] + * [ ``ispatch`` : see below ; only included you passed ``which`` as ``patch`` (the default) ] * [ ``base_procver_s`` : TBD; only included if you set ``include_base_procver`` ] * [ ``base_procver_f`` : TBD; only included if you set ``includebase_procver`` and ``which`` wasn't ``detections`` ] - * [ ``det_ra`` : the RA where this source was detected by LSST on the difference image. Only incluced if you specified ``include_source_positions`` ] - * [ ``det_dec`` : the Dec where this source was detected by LSST on the difference image. Only incluced if you specified ``include_source_positions`` ] - * [ ``det_ra_err`` : uncetainty on RA. Only incluced if you specified ``include_source_positions`` ] - * [ ``det_dec_err`` : uncertainty on Dec. Only incluced if you specified ``include_source_positions`` ] - * [ ``det_ra_dec_cov`` : covariance between ra and dec. Only incluced if you specified ``include_source_positions`` ] + * [ ``det_ra`` : the RA where this source was detected by LSST on the difference image. Only included if you specified ``include_source_positions`` ] + * [ ``det_dec`` : the Dec where this source was detected by LSST on the difference image. Only included if you specified ``include_source_positions`` ] + * [ ``det_ra_err`` : uncertainty on RA. Only included if you specified ``include_source_positions`` ] + * [ ``det_dec_err`` : uncertainty on Dec. Only included if you specified ``include_source_positions`` ] + * [ ``det_ra_dec_cov`` : covariance between ra and dec. Only included if you specified ``include_source_positions`` ] If you also get back ``objinfo``, then ROB DOCUMENT. About the flux values you get back ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -There are two kinds of photometry that is stored for object lightcurves. A ``diaSource`` stores *detections*. LSST does image subtractions, and then scans the difference image for soruces that patch detection thresholds. Anything found is a ``diaSource``. +There are two kinds of photometry that is stored for object lightcurves. A ``diaSource`` stores *detections*. LSST does image subtractions, and then scans the difference image for sources that patch detection thresholds. Anything found is a ``diaSource``. -A ``diaForcedSource`` stores *forced photometry*. When objects are known, LSST goes back and does image subractions and measures the brightness at the know object positions, regardless of whether they would have been deteted or not when scanning that difference image. +A ``diaForcedSource`` stores *forced photometry*. When objects are known, LSST goes back and does image subtractions and measures the brightness at the know object positions, regardless of whether they would have been detected or not when scanning that difference image. If you set ``which`` to ``detections``, you only get back ``diaSource`` values. The fluxes come from there, and forced sources are ignored. If you set ``which`` to ``forced``, you only get back ``diaForcedSource`` values. The fluxes come from there, and *mostly* diasources are ignored, except that the ``isdet`` column tells you if there was a ``diaSoruce`` at this visit for this root object. -If you set ``which`` to ``patch``... it's more complicated. You might think that (a) mostly what you want is forced sources, because it includes nondetections, and because the position is consistent so the lightcurve fluxes are less biased (**warning** it's totally unclear, however, exactly what this means for forced source values that come in the alert stream!). Howevever, forced photometry is performed by LSST at a delay, and we only find out about it if there is a later detection that triggers an alert. So, FASTDB will have some diasources where it does not have any forced photometry, and, normally you would expect this to be the most recent points. If you're planning follow-up, you want those most recent points. In this case, use ``patch``. You will get forced photometry, but if there are visits for the object where FASTDB has a detection but does not have forced photometry, it will "patch in" the photometry from the detection. +If you set ``which`` to ``patch``... it's more complicated. You might think that (a) mostly what you want is forced sources, because it includes nondetections, and because the position is consistent so the lightcurve fluxes are less biased (**warning** it's totally unclear, however, exactly what this means for forced source values that come in the alert stream!). However, forced photometry is performed by LSST at a delay, and we only find out about it if there is a later detection that triggers an alert. So, FASTDB will have some diasources where it does not have any forced photometry, and, normally you would expect this to be the most recent points. If you're planning follow-up, you want those most recent points. In this case, use ``patch``. You will get forced photometry, but if there are visits for the object where FASTDB has a detection but does not have forced photometry, it will "patch in" the photometry from the detection. -**TLDR short summary**: ``patch`` is what you want for knowing what we've got and planning follow-up. If you're trying to do any kind of high precision analysis with the phtometry from the alert stream, you're doing it wrong. +**TLDR short summary**: ``patch`` is what you want for knowing what we've got and planning follow-up. If you're trying to do any kind of high precision analysis with the photometry from the alert stream, you're doing it wrong. .. _ltcv-getltcv: ``/ltcv/getltcv`` -**************** +***************** Get the lightcurve of a single object. Hit this with one of: @@ -385,14 +385,14 @@ Get the lightcurve of a single object. Hit this with one of: ```` is the processing version of the photometry to fetch. If not given, it will assume ``default``. -You can optionally include a JSON-encoded dictionary as POST data with any of the keys ``bands``, ``which``, or ``mjd_now``. See the docuemtnation on :ref:`ltcv-getmanyltcvs` for what these mean. +You can optionally include a JSON-encoded dictionary as POST data with any of the keys ``bands``, ``which``, or ``mjd_now``. See the documentation on :ref:`ltcv-getmanyltcvs` for what these mean. -What you get back is the same as what youg et back from :ref:`ltcv-getmanyltcvs`, except that instead of ``ltcvs`` as a list of dictionaries, get get a single dictionary with the lightcurve for the one object. +What you get back is the same as what you get back from :ref:`ltcv-getmanyltcvs`, except that instead of ``ltcvs`` as a list of dictionaries, get get a single dictionary with the lightcurve for the one object. ``/ltcv/getrandomltcv`` -********************** +*********************** * ``/ltcv/getrandomltcv`` * ``/ltcv/getrandomltcv/`` @@ -405,7 +405,7 @@ Randomly choose an object from the given processing version (using "default" if .. _ltcv-gethottransients: ``/ltcv/gethottransients`` -************************* +************************** Call this with one of: @@ -420,7 +420,7 @@ Additional options that you can included in the ``json=`` dictionary are: * ``detected_in_last_days`` : Only return lightcurves of sources that have been detected by LSST between this many days ago and now. Do not include both this and ``detected_since_mjd``, because they are do different ways of asking the same question. If you don't include either, it defaults to 30 (*I think*) for ``detected_in_last_days``. -* ``mjd_now`` : Normally, it includes all data until the current mjd. If you're doing simulations, or if you want to (sort of) reconstruct what we knew earlier, pass this parameter with the MJD to pretend it is. This will affect the time window that ``detected_in_last_days`` specifies, and what photometry is returned. +* ``mjd_now`` : Normally, it includes all data until the current MJD. If you're doing simulations, or if you want to (sort of) reconstruct what we knew earlier, pass this parameter with the MJD to pretend it is. This will affect the time window that ``detected_in_last_days`` specifies, and what photometry is returned. * ``position_processing_version`` : TBD, and you probably don't want to include this @@ -474,7 +474,7 @@ Spectrum Endpoints This is the web API end point you use to register your desire for spectroscopic follow-up of a transient. (Or, ideally, host, but the system is not yet designed to distinguish the two.) You pass to it JSON-encoded POST data which is a dictionary with keys: -* ``requester`` : string. Who wants this specrum? In the case of RESSEPCT instances, this should indicate that it was RESSPECT, and which running algorithm / instance of RESSPECT is making the request. +* ``requester`` : string. Who wants this spectrum? In the case of RESSEPCT instances, this should indicate that it was RESSPECT, and which running algorithm / instance of RESSPECT is making the request. * ``objectids`` : The *root* object ids of the objects whose spectra you want. This is a list of uuids (or strings formatted from UUIDs); it is *not* a list of integers. Do *not* use the ``diaobjectid`` field to fill this out! @@ -500,7 +500,7 @@ POST to the endpoint with dictionary in a JSON payload. This may be an empty di * ``no_spectra_in_last_days``: int; only return objects that have not had spectrum information reported in this many days. This is also for coordination. If you don't want to consider just what is planned, but what somebody actually claims to have observed, then use this. If not given, it defaults to 7. (This may be combined with ``not_claimed_in_last_days``. It's entirely possible that people will report spectra that they have not claimed.) To disable consideration of existing spectra, as with ``not_claimed_in_last_days`` set this parameter to ``None``. -* ``detected_since_mjd`` : float. Only return objects that have been *detected* (i.e. found as a source in DIA scanning) by Rubin since this MJD. Be aware that an object may not have been detected in the last few days simply because it's field hasn't been observed! If not passed, then the server will use ``detected_in_last_days`` (below) instead. Pass ``None`` to explicilty disable consideration of recent detections. +* ``detected_since_mjd`` : float. Only return objects that have been *detected* (i.e. found as a source in DIA scanning) by Rubin since this MJD. Be aware that an object may not have been detected in the last few days simply because it's field hasn't been observed! If not passed, then the server will use ``detected_in_last_days`` (below) instead. Pass ``None`` to explicitly disable consideration of recent detections. * ``detected_in_last_days``: float. Only return objects that have been *detected* within this may previous days by LSST DIA. Ignored if ``detected_since_mjd`` is specified. If neither this nor ``detected_since_mjd`` is given, defaults to 14. @@ -517,7 +517,7 @@ You will get back a JSON-encoded list. Each element of the list is a dictionary * ``requester`` : a string, the name of the person or system who requested the spectrum * ``priority`` : an integer in the range [0,5]: the priority of the spectrum. Higher means higher priority. This is defined fuzzily, so consider it advisory rather than rigorous; different requesters may use this differently. * ``ra`` : RA in degrees of the object (from the ``diaobject`` table) -* ``dec`` : Dec in degrees of hte object (from the ``diaobject`` table ) +* ``dec`` : Dec in degrees of the object (from the ``diaobject`` table ) * ``latest_source_mjd`` : the MJD of the latest *detection* of this object * ``latest_source_band`` : the latest *detection* of this object * ``latest_source_mag`` : the AB magnitude of the latest *detection* @@ -539,9 +539,9 @@ POST to the api endpoint with a JSON payload that is a dict. Required keys are: * ``root_diaobject_id``: string UUID; the object ID of the object you're going to take a spectrum of. These UUIDs are returned by ``ltcv/gethottransients``. -* ``facility``: string; the name of the telescope or facility where you will take the spectrm. +* ``facility``: string; the name of the telescope or facility where you will take the spectrum. -* ``plantime``: string ``YYYY-MM-DD`` or ``YYYY-MM-DD HH:MM:SS``; when you expect to actuallyobtain the spectrum. +* ``plantime``: string ``YYYY-MM-DD`` or ``YYYY-MM-DD HH:MM:SS``; when you expect to actually obtain the spectrum. You may also include one optional key: @@ -552,7 +552,7 @@ If all is well, you will get back a dictionary with a single key: ``{'status': ' ``spectrum/removespectrumplan`` ******************************* -Use this to remove a spectrum plan. This isn't strictly necessary if you succesfully took a spectrum and reported the info with ``spectrum/reportspectruminfo`` (see below), but you may still use it. The real use case is if you planned a spectrum, but for whatever reason (e.g. the night was cloudy), you didn't actually get that spectrum. In that case, you probably want to remove your spectrum plan from FASTDB so that other people won't skip that object thinking you are going to do it. +Use this to remove a spectrum plan. This isn't strictly necessary if you successfully took a spectrum and reported the info with ``spectrum/reportspectruminfo`` (see below), but you may still use it. The real use case is if you planned a spectrum, but for whatever reason (e.g. the night was cloudy), you didn't actually get that spectrum. In that case, you probably want to remove your spectrum plan from FASTDB so that other people won't skip that object thinking you are going to do it. POST to the api endpoint with a JSON payload that is a dict. There are two required keywords: * ``root_diaobject_id``: string UUID @@ -572,9 +572,9 @@ POST to the api endpoint with a JSON payload that is a dict, with keys: * ``root_diaobject_id``: string UUID; the id of the object, the same value that all the previous URLs have used -* ``facility``: string; the name of the facility. If you submitted a plan, this should match the facililty that you sent to ``spectrum/planspectrum``. (It's OK to report spectra that you didn't declare a plan for ahead of time!) +* ``facility``: string; the name of the facility. If you submitted a plan, this should match the facility that you sent to ``spectrum/planspectrum``. (It's OK to report spectra that you didn't declare a plan for ahead of time!) -* ``mjd``: float; the mjd of when the spectrum was taken. (Beginning, middle, or end of exposure, doesn't matter.) +* ``mjd``: float; the MJD of when the spectrum was taken. (Beginning, middle, or end of exposure, doesn't matter.) * ``z``: float; the redshift of the supernova from the spectrum. Leave this blank ("" or None) if it cannot be determined. @@ -592,11 +592,11 @@ POST to the api endpoint a JSON-encoded dict. All keys are optional; possibilit * ``facility``: str; if included, only get spectrum information from this facility. Otherwise, include spectrum information from all facilities. -* ``mjd_min``: float; if included, only get information about spectra taken at this mjd or later. +* ``mjd_min``: float; if included, only get information about spectra taken at this MJD or later. -* ``mjd_max``: float; if included, only get information about spectra taken at this mjd or earlier. +* ``mjd_max``: float; if included, only get information about spectra taken at this MJD or earlier. -* ``classid``: float; if included, only get information about spectra tagged with this cass id. +* ``classid``: float; if included, only get information about spectra tagged with this class id. * ``z_min``: float; if included, only get information about spectra at this redshift or higher. @@ -606,19 +606,19 @@ POST to the api endpoint a JSON-encoded dict. All keys are optional; possibilit If you include no keys, you'll get information about all spectra that the database knows about, which may be overwhelming. (The API may also time out.) -If all is well, the response you get back is a json-encoded list (which might be empty). Each element of the list is a dictionary with keys: +If all is well, the response you get back is a JSON-encoded list (which might be empty). Each element of the list is a dictionary with keys: * ``specinfo_id``: string UUID; you can safely ignore this * ``root_diaobject_id``: string UUID; the same UUID you've been using all along -* ``facility``: string; the facility that reported the spectrumn +* ``facility``: string; the facility that reported the spectrum -* ``inserted_at``: datatime; the time at which the spectrum was reported to the database +* ``inserted_at``: datetime; the time at which the spectrum was reported to the database * ``mjd``: float, the MJD the spectrum was taken -* ``z``: float or None, the redshift from the spectrum. If None, it means that the redshfit wasn't able to be determined from the spectrum. +* ``z``: float or None, the redshift from the spectrum. If None, it means that the redshift wasn't able to be determined from the spectrum. * ``classid``: the reported class id. @@ -627,8 +627,8 @@ Direct SQL Queries **WARNING** : this API is currently broken and not working. -**Warning**: We strongly recommend *against* using custom-built SQL queries to the database. The reason is that the table structure surrounding :ref:`processing-versions` is complicated enough that it's very easy to construct a query that will give you results that to casual inspecton look right but that are in fact wrong. If you can't find a web API to do what you need to do, please talk to Rob. If you *must* do direct SQL queries, make sure you really understand how processing versions work. +**Warning**: We strongly recommend *against* using custom-built SQL queries to the database. The reason is that the table structure surrounding :ref:`processing-versions` is complicated enough that it's very easy to construct a query that will give you results that to casual inspection look right but that are in fact wrong. If you can't find a web API to do what you need to do, please talk to Rob. If you *must* do direct SQL queries, make sure you really understand how processing versions work. -The FASDTB web interface includes a front-end for direct read-only SQL queries to the backend PostgreSQL database. (Note that "read-only" means that you can't commit changes to the database. You *can* use temporary tables with this interface, and that is often a very useful thing to do.) +The FASTDB web interface includes a front-end for direct read-only SQL queries to the backend PostgreSQL database. (Note that "read-only" means that you can't commit changes to the database. You *can* use temporary tables with this interface, and that is often a very useful thing to do.) -TODO document this. In the mean time, see the `examples FASDTB client Juypyter notebook `_ for documentation on this interface. +TODO document this. In the mean time, see the `examples FASTDB client Juypyter notebook `_ for documentation on this interface. diff --git a/tests/services/test_sourceimporter.py b/tests/services/test_sourceimporter.py index b345603..4e95319 100644 --- a/tests/services/test_sourceimporter.py +++ b/tests/services/test_sourceimporter.py @@ -1,17 +1,17 @@ -import pytest import datetime -import time +import itertools import numbers import random import textwrap -import psycopg.errors -import itertools -from psycopg import sql +import time import db -from util import env_as_bool, FDBLogger, datetime_to_utc -from services.source_importer import SourceImporter +import psycopg.errors +import pytest +from psycopg import sql from services.brokerconsumer import FinkConsumer +from services.source_importer import SourceImporter +from util import FDBLogger, datetime_to_utc, env_as_bool # Ordering of these tests matters, because they use module scope # fixtures from tests/fixtures/alertcycle.py (the "alerts*" fixtures).