Skip to content

Commit cb10d04

Browse files
Merge pull request #108 from astrolabsoftware/ui-refactoring
Large API refactoring: introducing spark3D 0.3
2 parents 2d22e66 + c047d36 commit cb10d04

File tree

80 files changed

+3282
-6163
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+3282
-6163
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,8 @@ __pycache__
1010
htmlcov
1111
cov.txt
1212
.coverage
13+
build/
14+
dist/
15+
MANIFEST*
16+
pyspark3d.egg-info/
17+
scripts/

README.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,35 @@
1313
- [08/2018] **Release**: version 0.2.0, 0.2.1 (pyspark3d)
1414
- [09/2018] **Release**: version 0.2.2
1515

16-
<p align="center"><img width="500" src="https://github.com/astrolabsoftware/spark3D/raw/master/pic/spark3d_lib_0.2.2.png"/>
16+
<p align="center"><img width="500" src="https://github.com/astrolabsoftware/spark3D/raw/master/pic/spark3d_newapi.png"/>
1717
</p>
1818

19+
## Rationale
20+
21+
spark3D should be viewed as an extension of the Apache Spark framework, and more specifically the Spark SQL module, focusing on the manipulation of three*-dimensional data sets.
22+
23+
Why would you use spark3D? If you often need to repartition large spatial 3D data sets, or perform spatial queries (neighbour search, window queries, cross-match, clustering, ...), spark3D is for you. It contains optimised classes and methods to do so, and it spares you the implementation time! In addition, a big advantage of all those extensions is to efficiently perform visualisation of large data sets by quickly building a representation of your data set (see more [here](https://astrolabsoftware.github.io/spark3D/)).
24+
25+
spark3D exposes two API: Scala (spark3D) and Python (pyspark3d). The core developments are done in Scala, and interfaced with Python using the great [py4j](https://www.py4j.org/) package. This means pyspark3d might not contain all the features present in spark3D.
26+
In addition, due to difference between Scala and Python, there might be subtle differences in the two APIs.
27+
28+
While we try to stick to the latest Apache Spark developments, spark3D started with the RDD API and slowly migrated to use the DataFrame API. This process left a huge imprint on the code structure, and low-level layers in spark3D often still use RDD to manipulate the data. Do not be surprised if things are moving, the package is under an active development but we try to keep the user interface as stable as possible!
29+
30+
Last but not least: spark3D is by no means complete, and you are welcome to suggest changes, report bugs or inconsistent implementations, and contribute directly to the package!
31+
32+
Cheers,
33+
Julien
34+
35+
*Why 3? Because there are already plenty of very good packages dealing with 2D data sets (e.g. [geospark](http://geospark.datasyslab.org/), [geomesa](https://www.geomesa.org/), [magellan](https://magellan.ghost.io/), [GeoTrellis](https://github.com/locationtech/geotrellis), and others), but that was not suitable for many applications such as in astronomy!*
36+
1937
## Installation and tutorials
2038

2139
### Scala
2240

2341
You can link spark3D to your project (either spark-shell or spark-submit) by specifying the coordinates:
2442

2543
```
26-
spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.2.2"
44+
spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.3.0"
2745
```
2846

2947
### Python
@@ -38,7 +56,7 @@ Note that we release the assembly JAR with it.
3856

3957
### More information
4058

41-
See our amazing [website](https://astrolabsoftware.github.io/spark3D/)!
59+
See our [website](https://astrolabsoftware.github.io/spark3D/)!
4260

4361
## Contributors
4462

docs/01_installation.md

Lines changed: 0 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -191,83 +191,6 @@ toto:~$ PYSPARK_DRIVER_PYTHON=ipython pyspark \
191191
--jars /path/to/target/scala-2.11/spark3D-assembly-0.2.2.jar
192192
```
193193

194-
You should be able to import objects:
195-
196-
```python
197-
In [1]: import pyspark3d
198-
In [2]: from pyspark3d.geometryObjects import Point3D
199-
In [3]: Point3D?
200-
Signature: Point3D(x:float, y:float, z:float, isSpherical:bool) -> py4j.java_gateway.JavaObject
201-
Docstring:
202-
Binding around Point3D.scala. For full description,
203-
see `$spark3d/src/main/scala/com/spark3d/geometryObjects/Point3D.scala`.
204-
205-
By default, the input coordinates are supposed euclidean,
206-
that is (x, y, z). The user can also work with spherical input coordinates
207-
(x=r, y=theta, z=phi) by setting the argument isSpherical=true.
208-
209-
Parameters
210-
----------
211-
x : float
212-
Input X coordinate in Euclidean space, and R in spherical space.
213-
y : float
214-
Input Y coordinate in Euclidean space, and THETA in spherical space.
215-
z : float
216-
Input Z coordinate in Euclidean space, and PHI in spherical space.
217-
isSpherical : bool
218-
If true, it assumes that the coordinates of the Point3D
219-
are (r, theta, phi). Otherwise, it assumes cartesian
220-
coordinates (x, y, z).
221-
222-
Returns
223-
----------
224-
p3d : Point3D instance
225-
An instance of the class Point3D.
226-
227-
Example
228-
----------
229-
Instantiate a point with spherical coordinates (r, theta, phi)
230-
>>> p3d = Point3D(1.0, np.pi, 0.0, True)
231-
232-
The returned type is JavaObject (Point3D instance)
233-
>>> print(type(p3d))
234-
<class 'py4j.java_gateway.JavaObject'>
235-
236-
You can then call the method associated, for example
237-
>>> p3d.getVolume()
238-
0.0
239-
240-
Return the point coordinates
241-
>>> p3d = Point3D(1.0, 1.0, 0.0, False)
242-
>>> p3d.getCoordinatePython()
243-
[1.0, 1.0, 0.0]
244-
245-
It will be a JavaList by default
246-
>>> coord = p3d.getCoordinatePython()
247-
>>> print(type(coord))
248-
<class 'py4j.java_collections.JavaList'>
249-
250-
Make it a python list
251-
>>> coord_python = list(coord)
252-
>>> print(type(coord_python))
253-
<class 'list'>
254-
255-
[Astro] Convert the (theta, phi) in Healpix pixel index:
256-
>>> p3d = Point3D(1.0, np.pi, 0.0, True) # (z, theta, phi)
257-
>>> p3d.toHealpix(2048, True)
258-
50331644
259-
260-
To see all the available methods:
261-
>>> print(sorted(p3d.__dir__())) # doctest: +NORMALIZE_WHITESPACE
262-
['center', 'distanceTo', 'equals', 'getClass', 'getCoordinate',
263-
'getCoordinatePython', 'getEnvelope', 'getHash', 'getVolume',
264-
'hasCenterCloseTo', 'hashCode', 'intersects', 'isEqual', 'isSpherical',
265-
'notify', 'notifyAll', 'toHealpix', 'toHealpix$default$2', 'toString',
266-
'wait', 'x', 'y', 'z']
267-
File: ~/Documents/workspace/myrepos/spark3D/pyspark3d/geometryObjects.py
268-
Type: function
269-
```
270-
271194
# Batch mode and provided examples
272195

273196
You can follow the different tutorials:

docs/02_introduction_python.md

Lines changed: 0 additions & 207 deletions
This file was deleted.

0 commit comments

Comments
 (0)