A Graph Database-Based Approach to Analyze Network Log Files

  • Conference paper
  • First Online:
Network and System Security (NSS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11928))

Included in the following conference series:

Abstract

Network log files from different sources often need to be analyzed in order to facilitate a more accurate assessment of the cyber threat severity. For example, using command line tools, any log file can be reviewed only in isolation. While using a log management system allows for searching across different log files, the relationship(s) between different network activities may not be easy to establish from the analysis of these different log files. We can use relational databases to establish these relationships, for example using complex database queries involving multiple join operations to link the tables. In recent years, there has been a trend of using graph databases to manage data for semantic queries (e.g. importing a fixed amount of log data for subsequent analysis). Hence, in this paper, we propose a new approach to analyze network log files, by using the graph database. Specifically, we posit the importance of constantly monitoring log files for new entries for immediate processed and analysis, and their results imported into the graph database. To facilitate the evaluation of our proposed approach, we use the Zeek network security monitor system to produce log files from monitored network traffic in real-time. We then explain how graph databases can be used to analyze network log files in near-real time within a network security-monitoring environment. Findings from our research demonstrate the utility of graph data in analyzing log data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bejtlich, R.: The practice of network security monitoring: understanding incident detection and response. No Starch Press (2013)

    Google Scholar 

  2. MIT Lincoln Laboratory. DARPA Intrusion Detection Evaluation. http://www.ll.mit.edu/ideval/data/1999data.html. Accessed 4 June 2017

  3. National CyberWatch Center. MACCDC—Home of National CyberWatch Mid Atlantic CCDC (2017). https://www.maccdc.org. Accessed 27 July 2017

  4. Neise, P.: Intrusion Detection Through Relationship Analysis. SANS Institute InfoSec Reading Room (2016). https://www.sans.org/reading-room/whitepapers/detection/intrusion-detection-relationship-analysis-37352. Accessed 18 March 2017

  5. Neo4j. Neo4j, the world’s leading graph database. https://neo4j.com/. Accessed 21 Aug 2017

  6. Netresec. PCAP files from the US National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) (2017). https://www.netresec.com/?page=MACCDC. Accessed 20 Apr 2017

  7. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23), 2435–2463 (1999)

    Article  Google Scholar 

  8. Py2neo. The py2neo v3 Handbook. http://py2neo.org/v3/. Accessed 11 Mar 2017

  9. Robinson, I., Webber, J., Eirfrem, E.: Graph Databases - New Opportunities for Connected Data, 2nd edn. O’Reilly Media Inc., Sebastpol (2015)

    Google Scholar 

  10. Sanders, C., Smith, J.: Applied Network Security Monitoring: Collection, Detection, and Analysis. Elsevier (2013)

    Google Scholar 

  11. Roesch, M.: Snort: lightweight intrusion detection for networks. In: Lisa, vol. 99, no. 1, pp. 229–238, November 1999

    Google Scholar 

  12. Snort - Network Intrusion Detection & Prevention System. http://www.snort.org/. Accessed 21 Aug 2017

  13. Suricata. Suricata—Open Source IDS/IPS/NSM engine. https://suricata-ids.org/. Accessed 21 Aug 2017

  14. Zeek.org. The Zeek Network Security Monitor. https://www.bro.org. Accessed 15 Jan 2019

  15. Schindler, T.: Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats. In: Gesellschaft für Informatik e.V. (Hrsg.) Informatik 2017. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik, Bonn (2017)

    Google Scholar 

  16. Uetz, R., Benthin, L., Hemminghaus, C., Krebs, S., Yilmaz, T.: BREACH: a framework for the simulation of cyber attacks on company’s networks. In: Digital Forensics Research Conference Europe (Poster) (2017)

    Google Scholar 

  17. Djanali, S., et al.: Coro: graph-based automatic intrusion detection system signature generator for evoting protection. J. Theor. Appl. Inf. Technol. 81(3), 535–546 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kim-Kwang Raymond Choo .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. cnn.log [5]

  1. (1)

    ts: Timestamp that represents the time when the first packet of the connection occurred,

  2. (2)

    uid: Unique Identifier (UID) for the connection,

  3. (3)

    id.orig_h: IP address of source host,

  4. (4)

    id.orig_p: Source port,

  5. (5)

    id.resp_h: IP address of destination host,

  6. (6)

    id.resp_p: Destination port,

  7. (7)

    proto: Transport layer protocol,

  8. (8)

    service: Identification of an application protocol,

  9. (9)

    duration: Duration of the connection in seconds,

  10. (10)

    orig_bytes: Number of payload bytes the originator sent,

  11. (11)

    resp_bytes: Number of payload bytes the responder sent,

  12. (12)

    conn_state: Summary of the connection state. 3

  13. (13)

    local_orig: Connection is originated locally

  14. (14)

    local_resp: Connection is responded locally

  15. (15)

    missed_bytes: Indicates the number of missed bytes and represents packet loss

  16. (16)

    history: State history of connections.4

  17. (17)

    orig_pkts: Number of packets that the originator sent,

  18. (18)

    orig_ip_bytes: Number of IP level bytes that the originator sent (this is taken from the IP total length header field),

  19. (19)

    resp_pkts: Number of packets that the responder sent,

  20. (20)

    resp_ip_bytes: Number of IP level bytes that the responder sent,

  21. (21)

    tunnel_parents: If this connection was over a recognized tunnel, this indicated UID values for any encapsulating parent connection used over the lifetime of this inner connection.

1.2 B. dns.log [5]

  1. (1)

    ts: Timestamp that represents the earliest time at which the DNS protocol message over the associated connection is observed,

  2. (2)

    uid: UID of the connection over which DNS messages are being transferred,

  3. (3)

    id.orig_h: IP address of source host,

  4. (4)

    id.orig_p: Source port,

  5. (5)

    id_resp_h: IP address of destination host,

  6. (6)

    id.resp_p: Destination port,

  7. (7)

    proto: Transport layer protocol,

  8. (8)

    trans_id: A 16-bit identifier that is assigned by the program that generated the DNS query and that is also used in responses to match up replies to outstanding queries,

  9. (9)

    rtt: Round trip time for the query response, indicating the delay between the moment that the request was seen until the answer has started,

  10. (10)

    query: Domain name that is the subject of the DNS query,

  11. (11)

    qclass: QCLASS value specifying the class of the query,

  12. (12)

    qclass_name: Descriptive name for the class of the query,

  13. (13)

    qtype: QTYPE value specifying the type of the query,

  14. (14)

    qtype_name: Descriptive name for the type of the query,

  15. (15)

    rcode: Response Code value in DNS response messages,

  16. (16)

    rcode_name: Descriptive name for the response code value,

  17. (17)

    AA: Authoritative Answer bit for response messages,

  18. (18)

    TC: Truncation bit that specifies whether the message was truncated,

  19. (19)

    RD: Recursion Desired bit that indicates in a request message whether the client wants recursive service for this query,

  20. (20)

    RA: Recursion Available bit that indicates in a response message that the server supports recursive queries,

  21. (21)

    Z: A reserved field that is usually “0” in queries and responses,

  22. (22)

    answers: Set of resolved IP addresses and domains in the query answer,

  23. (23)

    TTLs: shows the caching intervals of the associated resources described in the query answer,

  24. (24)

    rejected: Rejected bit indicated whether the server rejected the DNS query.

1.3 C. http.log [5]

  1. (1)

    ts: Timestamp for when the request happened

  2. (2)

    uid: UID for the connection,

  3. (3)

    id.orig_h: IP address of source host,

  4. (4)

    id.orig_p: Source port,

  5. (5)

    id.resp_h: IP address of destination host,

  6. (6)

    id.resp_p: Destination port,

  7. (7)

    trans_depth: Number representing the pipelined depth into the connection of this request/response transaction,

  8. (8)

    method: Method used in the HTTP request (i.e. GET, POST, etc.),

  9. (9)

    host: HTTP Host header value,

  10. (10)

    uri: URI used in the request,

  11. (11)

    referrer: HTTP “referer” header,

  12. (12)

    version: Version portion of the HTTP request,

  13. (13)

    user_agent: HTTP User-Agent header value,

  14. (14)

    request_body_len: Uncompressed content size of the data transferred from the client in bytes,

  15. (15)

    response_body_len: Uncompressed content size of the data transferred from the server in bytes,

  16. (16)

    status_code: HTTP status code returned by the server,

  17. (17)

    status_msg: Human-readable HTTP status message,

  18. (18)

    info_code: Reply code returned by the server,

  19. (19)

    info_msg: Human-readable reply message,

  20. (20)

    tags: Tags that are a set of indicators of various attributes discovered and related to a particular request/response pair.

  21. (21)

    username: HTTP Basic Authentication user name (if found),

  22. (22)

    password: HTTP Basic Authentication password (if found),

  23. (23)

    proxied: All of the headers that may indicate if the request was proxied,

  24. (24)

    orig_fuids: List of unique file IDs6 in the request,

  25. (25)

    orig_filenames: List of filenames in the request,

  26. (26)

    orig_mime_types: MIME types for request objects,

  27. (27)

    resp_fuids: List of FUIDs in the response,

  28. (28)

    resp_filenames: List of filenames in the response,

  29. (29)

    resp_mime_types: MIME types for response objects.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diederichsen, L., Choo, KK.R., Le-Khac, NA. (2019). A Graph Database-Based Approach to Analyze Network Log Files. In: Liu, J., Huang, X. (eds) Network and System Security. NSS 2019. Lecture Notes in Computer Science(), vol 11928. Springer, Cham. https://doi.org/10.1007/978-3-030-36938-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36938-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36937-8

  • Online ISBN: 978-3-030-36938-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation