Allgemein

What causes Deadlocks?

rir:LockUnlockHow to manage Deadlocks?

  • Prevent:
    • Lock records at the beginning of a transaction
    • Use two-phase locking protocol
      • growing phase
      • shrinking phase
  • Resolve:
    • Allow them to occur

rir:GitBranchAlternative to Locking: MultiVersioning

  • Logging of steps:
  1. Read
  2. Manipulate
  3. Commit
    • Only works if manipulated file = same version as current file

rir:ArrowLeftRightSequential Consistency

  • = Interpreting parallell manipulations as sequential

rir:CloudBig Data

  • Volume
    • Large amounts
  • Veriety
    • many different files
  • Velovity
    • a lot of new data
  • Veracity
  • Value
    • worh

rir:PencilRulerSchema on Read Vs. Schema on Write

  • Schema on Write (traditional)
    • Design Schema before writing data
  • Schema on Read (big data)
    • Design Schema after writing data

rir:DatabaseNot Only SQL (NOSQL)

  • Do not need to understand relations before making changes
  • -> Better performance, because relations are optional
  • -> Data that’s not defined in the DB can still be saved
  • data storage/retrieval technologies natural for cloud environment
  • not ACID compliant
  • BASE
    • basically available
    • soft state
    • eventually consistent

rir:PriceTag2NOSQL Classifications

  • Key-value stores
    • simple key-value map
  • Document stores
    • Values -> Documents
  • Wide-Column stores
    • Rows and columns
  • Graph

rir:HonorOfKingsHADOOP

  • = open source implementation framework of MapReduce
  • How to analyze data if it is stored across multiple computers (cloud)?

rir:FileCloudHadoop Distributed File System (HDFS)

  • File system for data stored in cloud
  • Data -> broken into Blocks -> stored in nodes -> stored in clusters
  • Cluster:
    • consists of NameNode (master server) and DataNodes (slaves)
  • Overall controll through YARN
  • No updates just appending

rir:MapMapReduce Design Pattern

  • Requirement for HADOOP
  • Enables parallelization of data storage across multiple servers
  1. Map
    • Divide tasks so that multiple nodes can work on it
  2. Reduce
    • integrate two results into one
    • repeat

Important:

  • Map-Reduce
  • Schema read/write

Resources: