Join or Merge

Forums Tips and Tricks Join or Merge

Viewing 0 reply threads
  • Author
    Posts
    • #293
      adminpaul
      Keymaster

      Suppose you wish to add to your dataset new variables that have been recorded for the same individuals. For example, we might want to extend our table of Burger King menu items to include prices in different cities around the world. If our new variables covered the same cases in the same order, we can simply add them to the existing data, importing the data and then dragging the variable icons into the original data relation.

      It is more common that we want to extend the data with new information about some, but not all of the cases. To join new variables to a dataset, we need a unique index for each case; something the identifies the case in each set of the data so that Data desk knows how to match the variables. A row or case name can serve this purpose provided it is unique.

      Two Data desk commands are important here:
      GetCase(y, x), which expects the x argument to be a case number and returns the case value at that case in the variable y. If x is a variable full of case numbers, then it returns an equally long variable full of values extracted from y.

      LookUp(y, x), which returns the case number of the case in y whose value is x. If x is itself a variable, it returns a variable full of case numbers.

      Used together, these commands can do index matching. Thus,
      LookUp(index2, index1) returns a variable with the case numbers in the relation 2 with indices that match index values in the relation 1. Then we can use those case numbers to find cases in relation 2 to match to those in the first one.
      GetCase(Rel2var, LookUp(index2, index1))

      There are many options and special cases to think about for this operation, including versions of Lookup that find the first or last occurrence of a value. Consult the DD docs for those details.

      One more thing: the relation in which a derived variable belongs often is not clear until the variable is computed. You may have to move icons to appropriate relation windows to work with them.

      A subtle point: GetCase doesn’t care whether the values it is retrieving are text or numbers. Indeed, Data desk never assigns a “type” to a variable until it is used. So that isn’t a concern here.

Viewing 0 reply threads
  • You must be logged in to reply to this topic.