Bulk metadata export and update using MS Excel spreadsheets

Within the Content Server system, it is often necessary to edit a set of metadata documents quickly and easily.

In this recipe, Module Suite has been used to extract a predefined set of metadata attributes from all documents within a specific space to an excel spreadsheet, which can be later modified and used to update the original metadata in bulk.

The sample metadata fields included in the “Controlled Document” category we will be using are the following:

  • Doc ID
  • Title
  • Author
  • Owner
  • Status
  • Department

Since users will be editing the MS Excel file manually, we will consider using a template xlsx file where we will append the extracted metadata. In this way, we can add styling, column titles and any additional information the users might need to perform the update.

The sample code below shows how the export functionality could be implemented. Note that we are exporting metadata from all documents within a folder, but this could be easily extended to consider only a selection of documents (using a multi-action button, for example) o a list of documents resulting from a different functionality, such as a search. Also, note that we will be always exporting the document ID on the system to be later able to map back the document data, if an import is requested.

def exportMetadata = { folderID ->
    
    def documentTemplateID = 123456 // this will be the dataID of an xlsx file you want to use as template
    def rowIndex = 3 // this will be the index of the first row in the template you want to write data to
    def worksheetName = "Controlled Documents" // the name of the worksheet within the template where you want to write data
    
    def sourceFolder = docman.getNodeFast( folderID )
    
    // Get the xlsx template for export (Document templates.xlsx)
    def xlsxDoc = docman.getNodeFast( documentTemplateID ) 

    // Get Spreadsheet from template
    def spreadsheet = xlsx.loadSpreadsheet( xlsxDoc )

    // Get worksheet metadata from excel template
    def worksheetMetadata = spreadsheet.getWorksheetByName(worksheetName) 
    
    // For each document with the category 'Controlled Document' save the metadata in the excel document
    sourceFolder.children.each{ doc ->

        if( doc.subtype == 144 && doc."Controlled Document" ){

            def rowData = [ 
                doc.ID, // Column A
                doc.name, // B
                doc."Controlled Document"."Doc ID".toString(),  // C
                doc."Controlled Document"."Title".toString(), // D
                doc."Controlled Document"."Author".toString(), // E
                doc."Controlled Document"."Owner".toString(), // F
                doc."Controlled Document"."Status".toString(), // G
                doc."Controlled Document"."Department".toString() // H
            ]
            
            worksheetMetadata.writeRow( rowData, rowIndex++ )  
        }
    }

    def res = docman.getTempResource( "exportXLS", ".xlsx" )
    
    res.content = xlsx.save( spreadsheet )
    res.name = "Export metadata.xlsx"
    
    return res
}

Once the xlsx file has been generated and modified, it can be reimported in the system using a similar approach. This time, we will be scanning the xlsx file, locating the nodes one by one, and updating the category metadata in case of changes.

The import function could be implemented as follows. Note that we are referencing the single metadata fields by excel column ID (A, B, C, etc.) as the structure has been defined by the export function

def importMetadata = { file ->

    def rowIndex = 3 // First row in the file where data can be found
    def worksheetName = "Controlled Documents" // the name of the worksheet where data can be found
    
    // Get Spreadsheet from template
    def spreadsheet = xlsx.loadSpreadsheet(file)

    // Get worksheet metadata from excel template
    def worksheetMetadata = spreadsheet.getWorksheetByName(worksheetName)

    worksheetMetadata.readRows(rowIndex).each{ row->

        ownerID = 1000 // Default on Admin user if necessary.

        rowData = row.value

        try{

            def dataID = rowData["A"] as Long
            def doc = docman.getNode(dataID)
            if(doc){

                // Set metadata to document

                doc."Controlled Document"?."Doc ID"     = [(rowData["C"] as String)]
                doc."Controlled Document"?."Title"      = [(rowData["D"] as String)]     
                doc."Controlled Document"?."Author"     = [(rowData["E"] as String)]

                try{
                    ownerID = users.getUserById((rowData["F"] as Long)).ID
                }catch(e){}

                doc."Controlled Document"?."Owner"      = [ownerID]
                doc."Controlled Document"?."Status"     = [(rowData["G"] as String)]
                doc."Controlled Document"?."Department" = [(rowData["H"] as String)]
                
                doc.update()

                // Rename document, if name has changed
                def documentName = (rowData["B"] as String)
                
                if(doc.name != documentName && documentName != ""){
                    doc.rename(documentName)
                }  
            }

        }catch(e){
            log.error("Unable set metadata to document of row: ${row.key}",e)
        }

    }

}