Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.

Commit bdb9506

Browse files
committed
Merge branch 'django_updates'
2 parents e051257 + 3999702 commit bdb9506

13 files changed

Lines changed: 195 additions & 156 deletions

README.md

Lines changed: 66 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -22,27 +22,29 @@ For more information on the PHP implementation please see the [readme](../master
2222
keep reading...
2323

2424

25-
ElasticSearch
26-
==============
25+
ElasticSearch Support
26+
=====================
27+
28+
<b>Important pyDat 3.0 ElasticSearch Notes</b>:
2729

28-
<b>The ElasticSearch backend code is still under testing, please consider the following before using ES as a backend:</b>
30+
Note this is the only release (and overdue) for 3.0 as work is under way for pyDat 4.0.
31+
pyDat 4.0 will remove support for MongoDB and requires a minimum of ElasticSearch 5.2 but
32+
should be easier to work with and considerably faster due to significant improvements in
33+
ElasticSearch 5.x. It will also, more than likely, require a full re-ingestion of source
34+
data.
2935

30-
- Some things might be broken
31-
- I.e., some error handling might be non-existent
32-
- There might be random debug output printed out
33-
- The search language might not be complete
34-
- The data template used with ElasticSearch might change
35-
- Which means you might have ot re-ingest all of your data at some point!
36+
This release supports only ElasticSearch 2.x !!
3637

3738

3839
<b>PreReqs to run with ElasticSearch</b>:
3940

4041
- ElasticSearch installed somewhere
41-
- python elasticsearch library (pip install elasticsearch)
42+
- python elasticsearch library (pip install elasticsearch>=2.0.0,<3.0.0)
4243
- python lex yacc library (pip install ply)
4344
- below specified prereqs too
4445

4546
<b>ElasticSearch Scripting</b>
47+
4648
ElasticSearch comes with dynamic Groovy scripting disabled due to potential sandbox breakout issues with the Groovy container. Unfortunately, the only way to do certain things in ElasticSearch is via this scripting language. Because the default installation of ES does not have a work-around, there is a setting called ES_SCRIPTING_ENABLED in the pyDat settings file which is set to False by default. When set to True, the pyDat advanced search capability will expose an extra feature called 'Unique Domains' which given search results that will return multiple results for a given domain (e.g., due to multiple versions of a domain matching) will return only the latest entry instead of all entries. Before setting this option to True, you must install a script server-side on every ES node -- to do this, please copy the file called \_score.groovy from the es_scripts directory to your scripts directory located in the elasticsearch configuration directory. On package-based installs of ES on RedHat/CentOS or Ubuntu this should be /etc/elasticsearch/scripts. If the scripts directory does not exist, please create it. Note you have to restart the Node for it to pick up the script.
4749

4850
<b> ElasticSearch Plugins</b>
@@ -76,48 +78,74 @@ all data is ingested properly. Anyone setting up their database, should read the
7678
script before running it to ensure they've tweaked it for their setup. The following is the output from
7779
elasticsearch_populate -h
7880

79-
<pre>
80-
Usage: elasticsearch_populate.py [options]
81+
Version 3.0 introduces ElasticSearch 2.x as a backend for whois data
8182

82-
Options:
83+
<pre>
84+
usage: elasticsearch_populate.py [-h] (-f FILE | -d DIRECTORY) [-e EXTENSION]
85+
[-r] [-v] [--vverbose] [-s]
86+
[-x EXCLUDE | -n INCLUDE] [-o COMMENT]
87+
[-u [ES_URI [ES_URI ...]]] [-p INDEX_PREFIX]
88+
[-i IDENTIFIER] [-B BULK_SIZE]
89+
[--optimize-import] [-t THREADS]
90+
[--bulk-serializers BULK_SERIALIZERS]
91+
[--bulk-threads BULK_THREADS]
92+
[--enable-delta-indexes]
93+
94+
optional arguments:
8395
-h, --help show this help message and exit
84-
-f FILE, --file=FILE Input CSV file
85-
-d DIRECTORY, --directory=DIRECTORY
86-
Directory to recursively search for CSV files -
87-
prioritized over 'file'
88-
-e EXTENSION, --extension=EXTENSION
96+
-f FILE, --file FILE Input CSV file
97+
-d DIRECTORY, --directory DIRECTORY
98+
Directory to recursively search for CSV files --
99+
mutually exclusive to '-f' option
100+
-e EXTENSION, --extension EXTENSION
89101
When scanning for CSV files only parse files with
90102
given extension (default: 'csv')
91-
-i IDENTIFIER, --identifier=IDENTIFIER
92-
Numerical identifier to use in update to signify
93-
version (e.g., '8' or '20140120')
94-
-t THREADS, --threads=THREADS
95-
Number of workers, defaults to 2. Note that each
96-
worker will increase the load on your ES cluster
97-
-B BULK_SIZE, --bulk-size=BULK_SIZE
98-
Size of Bulk Insert Requests
103+
-r, --redo Attempt to re-import a failed import or import more
104+
data, uses stored metatdata from previous import (-o,
105+
-n, and -x not required and will be ignored!!)
99106
-v, --verbose Be verbose
100107
--vverbose Be very verbose (Prints status of every domain parsed,
101108
very noisy)
102109
-s, --stats Print out Stats after running
103-
-x EXCLUDE, --exclude=EXCLUDE
110+
-x EXCLUDE, --exclude EXCLUDE
104111
Comma separated list of keys to exclude if updating
105112
entry
106-
-n INCLUDE, --include=INCLUDE
113+
-n INCLUDE, --include INCLUDE
107114
Comma separated list of keys to include if updating
108115
entry (mutually exclusive to -x)
109-
-o COMMENT, --comment=COMMENT
116+
-o COMMENT, --comment COMMENT
110117
Comment to store with metadata
111-
-r, --redo Attempt to re-import a failed import or import more
112-
data, uses stored metatdata from previous import (-o
113-
and -x not required and will be ignored!!)
114-
-u ES_URI, --es-uri=ES_URI
115-
Location of ElasticSearch Server (e.g.,
116-
foo.server.com:9200)
117-
-p INDEX_PREFIX, --index-prefix=INDEX_PREFIX
118+
-u [ES_URI [ES_URI ...]], --es-uri [ES_URI [ES_URI ...]]
119+
Location(s) of ElasticSearch Server (e.g.,
120+
foo.server.com:9200) Can take multiple endpoints
121+
-p INDEX_PREFIX, --index-prefix INDEX_PREFIX
118122
Index prefix to use in ElasticSearch (default: whois)
119-
--bulk-threads=BULK_THREADS
120-
How many threads to use for making bulk requests to ES
123+
-i IDENTIFIER, --identifier IDENTIFIER
124+
Numerical identifier to use in update to signify
125+
version (e.g., '8' or '20140120')
126+
-B BULK_SIZE, --bulk-size BULK_SIZE
127+
Size of Bulk Elasticsearch Requests
128+
--optimize-import If enabled, will change ES index settings to speed up
129+
bulk imports, but if the cluster has a failure, data
130+
might be lost permanently!
131+
-t THREADS, --threads THREADS
132+
Number of workers, defaults to 2. Note that each
133+
worker will increase the load on your ES cluster since
134+
it will try to lookup whatever record it is working on
135+
in ES
136+
--bulk-serializers BULK_SERIALIZERS
137+
How many threads to spawn to combine messages from
138+
workers. Only increase this if you're are running a
139+
lot of workers and one cpu is unable to keep up with
140+
the load
141+
--bulk-threads BULK_THREADS
142+
How many threads to spawn to send bulk ES messages.
143+
The larger your cluster, the more you can increase
144+
this
145+
--enable-delta-indexes
146+
If enabled, will put changed entries in a separate
147+
index. These indexes can be safely deleted if space is
148+
an issue, also provides some other improvements
121149
</pre>
122150

123151

docker/apache.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@ WSGIScriptAlias "/" "/opt/WhoDat/pydat/pydat/wsgi.py" process-group=pydat applic
197197

198198
# Static content - CSS, Javascript, images, etc.
199199
Alias /static/ /opt/WhoDat/pydat/pydat/static/
200-
<Directory /opt/WhoDat/pydat/pydat/static>
200+
<Directory /opt/WhoDat/pydat/extras/www/static>
201201
Order allow,deny
202202
Allow from all
203203
</Directory>

docker/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ pymongo
22
requests
33
unicodecsv
44
markdown
5-
django
6-
elasticsearch
5+
django<=1.11.12
6+
elasticsearch>=2.0.0,<3.0.0
77
ply

pydat/pydat/ajax.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from django.conf import settings
55
from django.template import RequestContext
6-
from django.core.urlresolvers import reverse
6+
from django.urls import reverse
77
from django.shortcuts import render_to_response, HttpResponse
88
import urllib
99

pydat/pydat/settings.py

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@
1010

1111
DEBUG = False
1212

13-
TEMPLATE_DEBUG = DEBUG
14-
1513
SITE_ROOT = os.path.dirname(os.path.realpath(__file__))
1614

1715
HANDLER = 'mongo'
@@ -135,56 +133,63 @@
135133
STATIC_URL = '/static/'
136134

137135
# Additional locations of static files
138-
STATICFILES_DIRS = (
136+
STATICFILES_DIRS = [
139137
# Put strings here, like "/home/html/static" or "C:/www/django/static".
140138
# Always use forward slashes, even on Windows.
141139
# Don't forget to use absolute paths, not relative paths.
142-
os.path.join(SITE_ROOT, 'static'),
143-
)
140+
]
144141

145142
# List of finder classes that know how to find static files in
146143
# various locations.
147-
STATICFILES_FINDERS = (
144+
STATICFILES_FINDERS = [
148145
'django.contrib.staticfiles.finders.FileSystemFinder',
149146
'django.contrib.staticfiles.finders.AppDirectoriesFinder',
150-
# 'django.contrib.staticfiles.finders.DefaultStorageFinder',
151-
)
147+
]
152148

153149
# Make this unique, and don't share it with anybody.
154150
SECRET_KEY = 'o=skwv+igf2%#6n&p!nd##w(a*wqugkcq4-2=wugz0(715*!l#'
155151

156-
# List of callables that know how to import templates from various sources.
157-
TEMPLATE_LOADERS = (
158-
'django.template.loaders.filesystem.Loader',
159-
'django.template.loaders.app_directories.Loader',
160-
# 'django.template.loaders.eggs.Loader',
161-
)
162-
163152
TEST_RUNNER = 'django.test.runner.DiscoverRunner'
164153

165-
MIDDLEWARE_CLASSES = (
154+
MIDDLEWARE = [
166155
'django.middleware.common.CommonMiddleware',
167156
'django.contrib.sessions.middleware.SessionMiddleware',
168-
'django.middleware.csrf.CsrfViewMiddleware',
157+
#'django.middleware.csrf.CsrfViewMiddleware',
169158
'django.contrib.auth.middleware.AuthenticationMiddleware',
170159
'django.contrib.messages.middleware.MessageMiddleware',
171160
# Uncomment the next line for simple clickjacking protection:
172161
# 'django.middleware.clickjacking.XFrameOptionsMiddleware',
173-
)
162+
]
174163

175164
ROOT_URLCONF = 'pydat.urls'
176165

177166
# Python dotted path to the WSGI application used by Django's runserver.
178167
WSGI_APPLICATION = 'pydat.wsgi.application'
179168

180-
TEMPLATE_DIRS = (
181-
# Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
182-
# Always use forward slashes, even on Windows.
183-
# Don't forget to use absolute paths, not relative paths.
184-
os.path.join(SITE_ROOT, 'templates'),
185-
)
186169

187-
INSTALLED_APPS = (
170+
_TEMPLATE_DIRS_ =[os.path.join(SITE_ROOT, 'templates')]
171+
TEMPLATES = [
172+
{
173+
"BACKEND": "django.template.backends.django.DjangoTemplates",
174+
"DIRS": _TEMPLATE_DIRS_,
175+
"OPTIONS":{
176+
"context_processors":[
177+
'django.contrib.auth.context_processors.auth',
178+
'django.template.context_processors.debug',
179+
'django.template.context_processors.i18n',
180+
'django.template.context_processors.media',
181+
'django.template.context_processors.static',
182+
'django.template.context_processors.tz',
183+
'django.contrib.messages.context_processors.messages',
184+
'django.template.context_processors.csrf'
185+
],
186+
'debug': DEBUG,
187+
},
188+
189+
},
190+
]
191+
192+
INSTALLED_APPS = [
188193
'django.contrib.auth',
189194
'django.contrib.contenttypes',
190195
'django.contrib.sessions',
@@ -196,7 +201,7 @@
196201
# Uncomment the next line to enable admin documentation:
197202
# 'django.contrib.admindocs',
198203
'pydat',
199-
)
204+
]
200205

201206
# A sample logging configuration. The only tangible logging
202207
# performed by this configuration is to send an email to

pydat/pydat/templates/base.html

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
1+
{% load static %}
12
<!DOCTYPE HTML>
23
<html>
34
<head>
45
<title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
5-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
6-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
7-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
6+
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
7+
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
8+
<link rel="stylesheet" type="text/css" href="{% static 'css/pydat.css' %}">
89
{% block css %}
910
{% endblock %}
10-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
11-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
12-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
11+
<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
12+
<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
13+
<script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
1314
<script type="text/javascript">
1415
var resolve_url = "{% url 'ajax_resolve' %}";
1516
var csrf_token = '{{ csrf_token }}';
1617
var latest_version = '{{ latest_version }}';
1718
</script>
1819
{% block js_constants %}
1920
{% endblock %}
20-
<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
21+
<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
2122
{% block js %}
2223
{% endblock %}
2324
</head>

pydat/pydat/templates/domain_results.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
{% extends 'base.html' %}
2+
{% load static %}
23

34
{% block title %}Domain Search{% endblock %}
45

@@ -24,9 +25,9 @@
2425

2526
{% block js %}
2627
{% if legacy_search %}
27-
<script type="text/javascript" src="{{STATIC_URL}}/js/domain.js"></script>
28+
<script type="text/javascript" src="{% static '/js/domain.js' %}"></script>
2829
{% else %}
29-
<script type="text/javascript" src="{{STATIC_URL}}/js/domain_advanced.js"></script>
30+
<script type="text/javascript" src="{% static '/js/domain_advanced.js' %}"></script>
3031
{% endif %}
3132
{%endblock %}
3233

pydat/pydat/templates/nosearchbase.html

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
1+
{% load static %}
12
<!DOCTYPE HTML>
23
<html>
34
<head>
45
<title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
5-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
6-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
7-
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
6+
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
7+
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
8+
<link rel="stylesheet" type="text/css" href="{% static '/css/pydat.css' %}">
89
{% block css %}
910
{% endblock %}
10-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
11-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
12-
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
11+
<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
12+
<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
13+
<script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
1314
<script type="text/javascript">
1415
var resolve_url = "{% url 'ajax_resolve' %}";
1516
var csrf_token = '{{ csrf_token }}';
1617
var latest_version = '{{ latest_version }}';
1718
</script>
1819
{% block js_constants %}
1920
{% endblock %}
20-
<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
21+
<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
2122
{% block js %}
2223
{% endblock %}
2324
</head>

pydat/pydat/templates/pdns_results.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
{% extends 'base.html' %}
2+
{% load static %}
23

34
{% block title %}pDNS{% endblock %}
45

@@ -10,7 +11,7 @@
1011
{% endblock %}
1112

1213
{% block js %}
13-
<script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
14+
<script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
1415
{% endblock %}
1516

1617
{% block searchBar %}

pydat/pydat/templates/rpdns_results.html

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
{% extends 'base.html' %}
2+
{% load static %}
23

34
{% block title %}pDNS{% endblock %}
45

@@ -10,7 +11,7 @@
1011
{% endblock %}
1112

1213
{% block js %}
13-
<script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
14+
<script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
1415
{% endblock %}
1516

1617
{% block searchBar %}

0 commit comments

Comments
 (0)