Merge branch 'django_updates'

Mraoul · Mraoul · commit bdb950637c8f · 2017-06-15T11:22:53.000-04:00
diff --git a/README.md b/README.md
@@ -22,27 +22,29 @@ For more information on the PHP implementation please see the [readme](../master
 keep reading...
 
 
-ElasticSearch
-==============
+ElasticSearch Support
+=====================
+
+<b>Important pyDat 3.0 ElasticSearch Notes</b>:
 
-<b>The ElasticSearch backend code is still under testing, please consider the following before using ES as a backend:</b>
+Note this is the only release (and overdue) for 3.0 as work is under way for pyDat 4.0.
+pyDat 4.0 will remove support for MongoDB and requires a minimum of ElasticSearch 5.2 but
+should be easier to work with and considerably faster due to significant improvements in
+ElasticSearch 5.x. It will also, more than likely, require a full re-ingestion of source
+data.
 
-- Some things might be broken
-    - I.e., some error handling might be non-existent
-- There might be random debug output printed out
-- The search language might not be complete
-- The data template used with ElasticSearch might change
-    - Which means you might have ot re-ingest all of your data at some point!
+This release supports only ElasticSearch 2.x !!
 
 
 <b>PreReqs to run with ElasticSearch</b>:
 
 - ElasticSearch installed somewhere
-- python elasticsearch library (pip install elasticsearch)
+- python elasticsearch library (pip install elasticsearch>=2.0.0,<3.0.0)
 - python lex yacc library (pip install ply)
 - below specified prereqs too 
 
 <b>ElasticSearch Scripting</b>
+
 ElasticSearch comes with dynamic Groovy scripting disabled due to potential sandbox breakout issues with the Groovy container. Unfortunately, the only way to do certain things in ElasticSearch is via this scripting language. Because the default installation of ES does not have a work-around, there is a setting called ES_SCRIPTING_ENABLED in the pyDat settings file which is set to False by default. When set to True, the pyDat advanced search capability will expose an extra feature called 'Unique Domains' which given search results that will return multiple results for a given domain (e.g., due to multiple versions of a domain matching) will return only the latest entry instead of all entries. Before setting this option to True, you must install a script server-side on every ES node -- to do this, please copy the file called \_score.groovy from the es_scripts directory to your scripts directory located in the elasticsearch configuration directory. On package-based installs of ES on RedHat/CentOS or Ubuntu this should be /etc/elasticsearch/scripts. If the scripts directory does not exist, please create it. Note you have to restart the Node for it to pick up the script.
 
 <b> ElasticSearch Plugins</b>
@@ -76,48 +78,74 @@ all data is ingested properly. Anyone setting up their database, should read the
 script before running it to ensure they've tweaked it for their setup. The following is the output from
 elasticsearch_populate -h
 
-<pre>
-Usage: elasticsearch_populate.py [options]
+Version 3.0 introduces ElasticSearch 2.x as a backend for whois data
 
-Options:
+<pre>
+usage: elasticsearch_populate.py [-h] (-f FILE | -d DIRECTORY) [-e EXTENSION]
+                                 [-r] [-v] [--vverbose] [-s]
+                                 [-x EXCLUDE | -n INCLUDE] [-o COMMENT]
+                                 [-u [ES_URI [ES_URI ...]]] [-p INDEX_PREFIX]
+                                 [-i IDENTIFIER] [-B BULK_SIZE]
+                                 [--optimize-import] [-t THREADS]
+                                 [--bulk-serializers BULK_SERIALIZERS]
+                                 [--bulk-threads BULK_THREADS]
+                                 [--enable-delta-indexes]
+
+optional arguments:
   -h, --help            show this help message and exit
-  -f FILE, --file=FILE  Input CSV file
-  -d DIRECTORY, --directory=DIRECTORY
-                        Directory to recursively search for CSV files -
-                        prioritized over 'file'
-  -e EXTENSION, --extension=EXTENSION
+  -f FILE, --file FILE  Input CSV file
+  -d DIRECTORY, --directory DIRECTORY
+                        Directory to recursively search for CSV files --
+                        mutually exclusive to '-f' option
+  -e EXTENSION, --extension EXTENSION
                         When scanning for CSV files only parse files with
                         given extension (default: 'csv')
-  -i IDENTIFIER, --identifier=IDENTIFIER
-                        Numerical identifier to use in update to signify
-                        version (e.g., '8' or '20140120')
-  -t THREADS, --threads=THREADS
-                        Number of workers, defaults to 2. Note that each
-                        worker will increase the load on your ES cluster
-  -B BULK_SIZE, --bulk-size=BULK_SIZE
-                        Size of Bulk Insert Requests
+  -r, --redo            Attempt to re-import a failed import or import more
+                        data, uses stored metatdata from previous import (-o,
+                        -n, and -x not required and will be ignored!!)
   -v, --verbose         Be verbose
   --vverbose            Be very verbose (Prints status of every domain parsed,
                         very noisy)
   -s, --stats           Print out Stats after running
-  -x EXCLUDE, --exclude=EXCLUDE
+  -x EXCLUDE, --exclude EXCLUDE
                         Comma separated list of keys to exclude if updating
                         entry
-  -n INCLUDE, --include=INCLUDE
+  -n INCLUDE, --include INCLUDE
                         Comma separated list of keys to include if updating
                         entry (mutually exclusive to -x)
-  -o COMMENT, --comment=COMMENT
+  -o COMMENT, --comment COMMENT
                         Comment to store with metadata
-  -r, --redo            Attempt to re-import a failed import or import more
-                        data, uses stored metatdata from previous import (-o
-                        and -x not required and will be ignored!!)
-  -u ES_URI, --es-uri=ES_URI
-                        Location of ElasticSearch Server (e.g.,
-                        foo.server.com:9200)
-  -p INDEX_PREFIX, --index-prefix=INDEX_PREFIX
+  -u [ES_URI [ES_URI ...]], --es-uri [ES_URI [ES_URI ...]]
+                        Location(s) of ElasticSearch Server (e.g.,
+                        foo.server.com:9200) Can take multiple endpoints
+  -p INDEX_PREFIX, --index-prefix INDEX_PREFIX
                         Index prefix to use in ElasticSearch (default: whois)
-  --bulk-threads=BULK_THREADS
-                        How many threads to use for making bulk requests to ES
+  -i IDENTIFIER, --identifier IDENTIFIER
+                        Numerical identifier to use in update to signify
+                        version (e.g., '8' or '20140120')
+  -B BULK_SIZE, --bulk-size BULK_SIZE
+                        Size of Bulk Elasticsearch Requests
+  --optimize-import     If enabled, will change ES index settings to speed up
+                        bulk imports, but if the cluster has a failure, data
+                        might be lost permanently!
+  -t THREADS, --threads THREADS
+                        Number of workers, defaults to 2. Note that each
+                        worker will increase the load on your ES cluster since
+                        it will try to lookup whatever record it is working on
+                        in ES
+  --bulk-serializers BULK_SERIALIZERS
+                        How many threads to spawn to combine messages from
+                        workers. Only increase this if you're are running a
+                        lot of workers and one cpu is unable to keep up with
+                        the load
+  --bulk-threads BULK_THREADS
+                        How many threads to spawn to send bulk ES messages.
+                        The larger your cluster, the more you can increase
+                        this
+  --enable-delta-indexes
+                        If enabled, will put changed entries in a separate
+                        index. These indexes can be safely deleted if space is
+                        an issue, also provides some other improvements
 </pre>
 
 
diff --git a/docker/apache.config b/docker/apache.config
@@ -197,7 +197,7 @@ WSGIScriptAlias "/" "/opt/WhoDat/pydat/pydat/wsgi.py" process-group=pydat applic
 
     # Static content - CSS, Javascript, images, etc.
     Alias /static/ /opt/WhoDat/pydat/pydat/static/
-    <Directory /opt/WhoDat/pydat/pydat/static>
+    <Directory /opt/WhoDat/pydat/extras/www/static>
         Order allow,deny
         Allow from all
     </Directory>
diff --git a/docker/requirements.txt b/docker/requirements.txt
@@ -2,6 +2,6 @@ pymongo
 requests
 unicodecsv
 markdown
-django
-elasticsearch
+django<=1.11.12
+elasticsearch>=2.0.0,<3.0.0
 ply
diff --git a/pydat/pydat/ajax.py b/pydat/pydat/ajax.py
@@ -3,7 +3,7 @@
 
 from django.conf import settings
 from django.template import RequestContext
-from django.core.urlresolvers import reverse
+from django.urls import reverse
 from django.shortcuts import render_to_response, HttpResponse
 import urllib
 
diff --git a/pydat/pydat/settings.py b/pydat/pydat/settings.py
@@ -10,8 +10,6 @@
 
 DEBUG = False
 
-TEMPLATE_DEBUG = DEBUG
-
 SITE_ROOT = os.path.dirname(os.path.realpath(__file__))
 
 HANDLER = 'mongo'
@@ -135,56 +133,63 @@
 STATIC_URL = '/static/'
 
 # Additional locations of static files
-STATICFILES_DIRS = (
+STATICFILES_DIRS = [
     # Put strings here, like "/home/html/static" or "C:/www/django/static".
     # Always use forward slashes, even on Windows.
     # Don't forget to use absolute paths, not relative paths.
-    os.path.join(SITE_ROOT, 'static'),
-)
+]
 
 # List of finder classes that know how to find static files in
 # various locations.
-STATICFILES_FINDERS = (
+STATICFILES_FINDERS = [
     'django.contrib.staticfiles.finders.FileSystemFinder',
     'django.contrib.staticfiles.finders.AppDirectoriesFinder',
-#    'django.contrib.staticfiles.finders.DefaultStorageFinder',
-)
+]
 
 # Make this unique, and don't share it with anybody.
 SECRET_KEY = 'o=skwv+igf2%#6n&p!nd##w(a*wqugkcq4-2=wugz0(715*!l#'
 
-# List of callables that know how to import templates from various sources.
-TEMPLATE_LOADERS = (
-    'django.template.loaders.filesystem.Loader',
-    'django.template.loaders.app_directories.Loader',
-#     'django.template.loaders.eggs.Loader',
-)
-
 TEST_RUNNER = 'django.test.runner.DiscoverRunner'
 
-MIDDLEWARE_CLASSES = (
+MIDDLEWARE = [
     'django.middleware.common.CommonMiddleware',
     'django.contrib.sessions.middleware.SessionMiddleware',
-    'django.middleware.csrf.CsrfViewMiddleware',
+    #'django.middleware.csrf.CsrfViewMiddleware',
     'django.contrib.auth.middleware.AuthenticationMiddleware',
     'django.contrib.messages.middleware.MessageMiddleware',
     # Uncomment the next line for simple clickjacking protection:
     # 'django.middleware.clickjacking.XFrameOptionsMiddleware',
-)
+]
 
 ROOT_URLCONF = 'pydat.urls'
 
 # Python dotted path to the WSGI application used by Django's runserver.
 WSGI_APPLICATION = 'pydat.wsgi.application'
 
-TEMPLATE_DIRS = (
-    # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
-    # Always use forward slashes, even on Windows.
-    # Don't forget to use absolute paths, not relative paths.
-    os.path.join(SITE_ROOT, 'templates'),
-)
 
-INSTALLED_APPS = (
+_TEMPLATE_DIRS_ =[os.path.join(SITE_ROOT, 'templates')]
+TEMPLATES = [
+        {
+        "BACKEND": "django.template.backends.django.DjangoTemplates",
+        "DIRS": _TEMPLATE_DIRS_,
+        "OPTIONS":{
+                "context_processors":[
+                            'django.contrib.auth.context_processors.auth',
+                            'django.template.context_processors.debug',
+                            'django.template.context_processors.i18n',
+                            'django.template.context_processors.media',
+                            'django.template.context_processors.static',
+                            'django.template.context_processors.tz',
+                            'django.contrib.messages.context_processors.messages',
+                            'django.template.context_processors.csrf'
+                ],
+                'debug': DEBUG,
+            },
+
+        },
+]
+
+INSTALLED_APPS = [
     'django.contrib.auth',
     'django.contrib.contenttypes',
     'django.contrib.sessions',
@@ -196,7 +201,7 @@
     # Uncomment the next line to enable admin documentation:
     # 'django.contrib.admindocs',
     'pydat',
-)
+]
 
 # A sample logging configuration. The only tangible logging
 # performed by this configuration is to send an email to
diff --git a/pydat/pydat/templates/base.html b/pydat/pydat/templates/base.html
@@ -1,23 +1,24 @@
+{% load static %}
 <!DOCTYPE HTML>
 <html>
   <head>
     <title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
+		<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
+		<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
+		<link rel="stylesheet" type="text/css" href="{% static 'css/pydat.css' %}">
 	{% block css %}
 	{% endblock %}
-		<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
-		<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
-        <script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
+		<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
+		<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
+        <script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
         <script type="text/javascript">
           var resolve_url = "{% url 'ajax_resolve' %}";
           var csrf_token = '{{ csrf_token }}';
           var latest_version = '{{ latest_version }}';
         </script>
     {% block js_constants %}
     {% endblock %}
-		<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
+		<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
     {% block js %}
     {% endblock %}
   </head>
diff --git a/pydat/pydat/templates/domain_results.html b/pydat/pydat/templates/domain_results.html
@@ -1,4 +1,5 @@
 {% extends 'base.html' %}
+{% load static %}
 
 {% block title %}Domain Search{% endblock %}
 
@@ -24,9 +25,9 @@
 
 {% block js %}
     {% if legacy_search %}
-        <script type="text/javascript" src="{{STATIC_URL}}/js/domain.js"></script>
+        <script type="text/javascript" src="{% static '/js/domain.js' %}"></script>
     {% else %}
-        <script type="text/javascript" src="{{STATIC_URL}}/js/domain_advanced.js"></script>
+        <script type="text/javascript" src="{% static '/js/domain_advanced.js' %}"></script>
     {% endif %}
 {%endblock %}
 
diff --git a/pydat/pydat/templates/nosearchbase.html b/pydat/pydat/templates/nosearchbase.html
@@ -1,23 +1,24 @@
+{% load static %}
 <!DOCTYPE HTML>
 <html>
   <head>
     <title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
-		<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
+		<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
+		<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
+		<link rel="stylesheet" type="text/css" href="{% static '/css/pydat.css' %}">
 	{% block css %}
 	{% endblock %}
-		<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
-		<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
-        <script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
+		<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
+		<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
+        <script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
         <script type="text/javascript">
           var resolve_url = "{% url 'ajax_resolve' %}";
           var csrf_token = '{{ csrf_token }}';
           var latest_version = '{{ latest_version }}';
         </script>
     {% block js_constants %}
     {% endblock %}
-		<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
+		<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
     {% block js %}
     {% endblock %}
   </head>
diff --git a/pydat/pydat/templates/pdns_results.html b/pydat/pydat/templates/pdns_results.html
@@ -1,4 +1,5 @@
 {% extends 'base.html' %}
+{% load static %}
 
 {% block title %}pDNS{% endblock %}
 
@@ -10,7 +11,7 @@
 {% endblock %}
 
 {% block js %}
-    <script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
+    <script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
 {% endblock %}
 
 {% block searchBar %}
diff --git a/pydat/pydat/templates/rpdns_results.html b/pydat/pydat/templates/rpdns_results.html
@@ -1,4 +1,5 @@
 {% extends 'base.html' %}
+{% load static %}
 
 {% block title %}pDNS{% endblock %}
 
@@ -10,7 +11,7 @@
 {% endblock %}
 
 {% block js %}
-    <script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
+    <script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
 {% endblock %}
 
 {% block searchBar %}
diff --git a/pydat/pydat/templates/stats.html b/pydat/pydat/templates/stats.html
diff --git a/pydat/pydat/urls.py b/pydat/pydat/urls.py
diff --git a/pydat/pydat/views.py b/pydat/pydat/views.py