.. _project-itel: Ingress Intelligence Scraper ============================ This script is used to scrape comms data from the Ingress Intel API. Process ------- The basic process this script goes through is: 1. Read /etc/scrape_intel.conf (selects a random user from the list) 2. Open the intel site without session data 3. Follow the Google SSO process and prime Intel requests (valid session) 4. Begin requesting intel data from best possible timestamp 5. Optionally write data to file 6. Optionally send data to logstash 7. Periodically perform requests to mimic a typical conversation Configuration File ------------------ The configuration file must contain, at a minimum, a single user. Location: /etc/scrape_plexts.yml || ./scrape_plexts.yml Sample File w/ defaults: .. code-block:: yaml # Performs random requests to mimic a typical client random_requests: True # Writes/reads last timestamp to/from save_state: True # File to write the timestamp to timestamp_file: '/tmp/intel_epoc' # Standard request headers std_headers: User-Agent: 'Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.7.1' Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' Accept-Language: 'en-US,en;q=0.5' Accept-Encoding: 'gzip, deflate' DNT: 1 # Area to scan area: min: lon: -180000000 lat: -90000000 max: lon: 180000000 lat: 90000000 # A user will be randomly selected from this list and used for the whole session users: : '' : '' : '' # If set, a json output file will be created / appended to #output_file: /tmp/plexts.json # If set, data will be push to this logstash server #output_logstash: ls.domain.tld Minimum required configuration: .. code-block:: yaml output_file: /tmp/plexts.json users: : Requisites ---------- * python-bs4 * python-requests * python-yaml * /etc/scrape_plexts.yml || ./scrape_plexts.yml Notes ----- * An attempt to read/write /tmp/intel_epoch is made to retain state between executions * The typical client waits 90 seconds between plext requests * If 50 plexts were returned, a new request is made immediately * If only a few plexts were returned, the client will wait 2 minutes Locations --------- Sioux Falls Area: .. code-block:: yaml area: min: lon: -100640259 lat: 42931493 max: lon: -93367310 lat: 44944585 Los Angeles Area: .. code-block:: yaml area: min: lon: -118801346 lat: 33635774 max: lon: -117043533 lat: 34402944 Global: .. code-block:: yaml area: min: lon: 180000000 lat: 90000000 max: lon: -180000000 lat: -90000000 Authors ------- * Michael Lustfield