Abstract
Network log files from different sources often need to be analyzed in order to facilitate a more accurate assessment of the cyber threat severity. For example, using command line tools, any log file can be reviewed only in isolation. While using a log management system allows for searching across different log files, the relationship(s) between different network activities may not be easy to establish from the analysis of these different log files. We can use relational databases to establish these relationships, for example using complex database queries involving multiple join operations to link the tables. In recent years, there has been a trend of using graph databases to manage data for semantic queries (e.g. importing a fixed amount of log data for subsequent analysis). Hence, in this paper, we propose a new approach to analyze network log files, by using the graph database. Specifically, we posit the importance of constantly monitoring log files for new entries for immediate processed and analysis, and their results imported into the graph database. To facilitate the evaluation of our proposed approach, we use the Zeek network security monitor system to produce log files from monitored network traffic in real-time. We then explain how graph databases can be used to analyze network log files in near-real time within a network security-monitoring environment. Findings from our research demonstrate the utility of graph data in analyzing log data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bejtlich, R.: The practice of network security monitoring: understanding incident detection and response. No Starch Press (2013)
MIT Lincoln Laboratory. DARPA Intrusion Detection Evaluation. http://www.ll.mit.edu/ideval/data/1999data.html. Accessed 4 June 2017
National CyberWatch Center. MACCDC—Home of National CyberWatch Mid Atlantic CCDC (2017). https://www.maccdc.org. Accessed 27 July 2017
Neise, P.: Intrusion Detection Through Relationship Analysis. SANS Institute InfoSec Reading Room (2016). https://www.sans.org/reading-room/whitepapers/detection/intrusion-detection-relationship-analysis-37352. Accessed 18 March 2017
Neo4j. Neo4j, the world’s leading graph database. https://neo4j.com/. Accessed 21 Aug 2017
Netresec. PCAP files from the US National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) (2017). https://www.netresec.com/?page=MACCDC. Accessed 20 Apr 2017
Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23), 2435–2463 (1999)
Py2neo. The py2neo v3 Handbook. http://py2neo.org/v3/. Accessed 11 Mar 2017
Robinson, I., Webber, J., Eirfrem, E.: Graph Databases - New Opportunities for Connected Data, 2nd edn. O’Reilly Media Inc., Sebastpol (2015)
Sanders, C., Smith, J.: Applied Network Security Monitoring: Collection, Detection, and Analysis. Elsevier (2013)
Roesch, M.: Snort: lightweight intrusion detection for networks. In: Lisa, vol. 99, no. 1, pp. 229–238, November 1999
Snort - Network Intrusion Detection & Prevention System. http://www.snort.org/. Accessed 21 Aug 2017
Suricata. Suricata—Open Source IDS/IPS/NSM engine. https://suricata-ids.org/. Accessed 21 Aug 2017
Zeek.org. The Zeek Network Security Monitor. https://www.bro.org. Accessed 15 Jan 2019
Schindler, T.: Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats. In: Gesellschaft für Informatik e.V. (Hrsg.) Informatik 2017. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik, Bonn (2017)
Uetz, R., Benthin, L., Hemminghaus, C., Krebs, S., Yilmaz, T.: BREACH: a framework for the simulation of cyber attacks on company’s networks. In: Digital Forensics Research Conference Europe (Poster) (2017)
Djanali, S., et al.: Coro: graph-based automatic intrusion detection system signature generator for evoting protection. J. Theor. Appl. Inf. Technol. 81(3), 535–546 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 A. cnn.log [5]
-
(1)
ts: Timestamp that represents the time when the first packet of the connection occurred,
-
(2)
uid: Unique Identifier (UID) for the connection,
-
(3)
id.orig_h: IP address of source host,
-
(4)
id.orig_p: Source port,
-
(5)
id.resp_h: IP address of destination host,
-
(6)
id.resp_p: Destination port,
-
(7)
proto: Transport layer protocol,
-
(8)
service: Identification of an application protocol,
-
(9)
duration: Duration of the connection in seconds,
-
(10)
orig_bytes: Number of payload bytes the originator sent,
-
(11)
resp_bytes: Number of payload bytes the responder sent,
-
(12)
conn_state: Summary of the connection state. 3
-
(13)
local_orig: Connection is originated locally
-
(14)
local_resp: Connection is responded locally
-
(15)
missed_bytes: Indicates the number of missed bytes and represents packet loss
-
(16)
history: State history of connections.4
-
(17)
orig_pkts: Number of packets that the originator sent,
-
(18)
orig_ip_bytes: Number of IP level bytes that the originator sent (this is taken from the IP total length header field),
-
(19)
resp_pkts: Number of packets that the responder sent,
-
(20)
resp_ip_bytes: Number of IP level bytes that the responder sent,
-
(21)
tunnel_parents: If this connection was over a recognized tunnel, this indicated UID values for any encapsulating parent connection used over the lifetime of this inner connection.
1.2 B. dns.log [5]
-
(1)
ts: Timestamp that represents the earliest time at which the DNS protocol message over the associated connection is observed,
-
(2)
uid: UID of the connection over which DNS messages are being transferred,
-
(3)
id.orig_h: IP address of source host,
-
(4)
id.orig_p: Source port,
-
(5)
id_resp_h: IP address of destination host,
-
(6)
id.resp_p: Destination port,
-
(7)
proto: Transport layer protocol,
-
(8)
trans_id: A 16-bit identifier that is assigned by the program that generated the DNS query and that is also used in responses to match up replies to outstanding queries,
-
(9)
rtt: Round trip time for the query response, indicating the delay between the moment that the request was seen until the answer has started,
-
(10)
query: Domain name that is the subject of the DNS query,
-
(11)
qclass: QCLASS value specifying the class of the query,
-
(12)
qclass_name: Descriptive name for the class of the query,
-
(13)
qtype: QTYPE value specifying the type of the query,
-
(14)
qtype_name: Descriptive name for the type of the query,
-
(15)
rcode: Response Code value in DNS response messages,
-
(16)
rcode_name: Descriptive name for the response code value,
-
(17)
AA: Authoritative Answer bit for response messages,
-
(18)
TC: Truncation bit that specifies whether the message was truncated,
-
(19)
RD: Recursion Desired bit that indicates in a request message whether the client wants recursive service for this query,
-
(20)
RA: Recursion Available bit that indicates in a response message that the server supports recursive queries,
-
(21)
Z: A reserved field that is usually “0” in queries and responses,
-
(22)
answers: Set of resolved IP addresses and domains in the query answer,
-
(23)
TTLs: shows the caching intervals of the associated resources described in the query answer,
-
(24)
rejected: Rejected bit indicated whether the server rejected the DNS query.
1.3 C. http.log [5]
-
(1)
ts: Timestamp for when the request happened
-
(2)
uid: UID for the connection,
-
(3)
id.orig_h: IP address of source host,
-
(4)
id.orig_p: Source port,
-
(5)
id.resp_h: IP address of destination host,
-
(6)
id.resp_p: Destination port,
-
(7)
trans_depth: Number representing the pipelined depth into the connection of this request/response transaction,
-
(8)
method: Method used in the HTTP request (i.e. GET, POST, etc.),
-
(9)
host: HTTP Host header value,
-
(10)
uri: URI used in the request,
-
(11)
referrer: HTTP “referer” header,
-
(12)
version: Version portion of the HTTP request,
-
(13)
user_agent: HTTP User-Agent header value,
-
(14)
request_body_len: Uncompressed content size of the data transferred from the client in bytes,
-
(15)
response_body_len: Uncompressed content size of the data transferred from the server in bytes,
-
(16)
status_code: HTTP status code returned by the server,
-
(17)
status_msg: Human-readable HTTP status message,
-
(18)
info_code: Reply code returned by the server,
-
(19)
info_msg: Human-readable reply message,
-
(20)
tags: Tags that are a set of indicators of various attributes discovered and related to a particular request/response pair.
-
(21)
username: HTTP Basic Authentication user name (if found),
-
(22)
password: HTTP Basic Authentication password (if found),
-
(23)
proxied: All of the headers that may indicate if the request was proxied,
-
(24)
orig_fuids: List of unique file IDs6 in the request,
-
(25)
orig_filenames: List of filenames in the request,
-
(26)
orig_mime_types: MIME types for request objects,
-
(27)
resp_fuids: List of FUIDs in the response,
-
(28)
resp_filenames: List of filenames in the response,
-
(29)
resp_mime_types: MIME types for response objects.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Diederichsen, L., Choo, KK.R., Le-Khac, NA. (2019). A Graph Database-Based Approach to Analyze Network Log Files. In: Liu, J., Huang, X. (eds) Network and System Security. NSS 2019. Lecture Notes in Computer Science(), vol 11928. Springer, Cham. https://doi.org/10.1007/978-3-030-36938-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-36938-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36937-8
Online ISBN: 978-3-030-36938-5
eBook Packages: Computer ScienceComputer Science (R0)