Data Integration Via Universal Keys

Data Integration Via Universal Keys

DataIntegrationViaUniversalKeys

RobertGrossman

UniversityofIllinoisatChicago

September3,2006

1Introduction

WedescribeaninfrastructureforintegratingdistributeddatacalledDataSpace[3]1Inadditiontosup-portingdataandmetadata,theinfrastructurealsosupportsgloballyuniquekeysforintegratingdatathatwecalluniversalkeys.Incontrasttosomeofthestandardapproachestodataintegration,DataS-pacedoesnottrytoachievethefullsemanticintegrationofdistributeddata,butinsteadprovidestheminimuminfrastructurenecessarytointegratedistributeddatathatisattachedtouniversalkeys.Wealsodescribesomeapplicationsthathavebeenbuiltwiththisinfrastructureinastronomy,bioinformatics,andearthscience.

TCP-basedwebandgridservices,asusuallydeployed,havebeenshowntohaveproblemsforintegrat-ingverylargedatasetsoverwidearea,highperformancenetworks[1].Recently,wehaveimplementedapeer-to-peerversionofDataSpacecalledSectorthatisdesignedforworkingwithlargedatasetsoverwidearea,highperformancenetworks[7].SectorhasbeenusedtodistributedtheterabytesizecatalogdatafortheSloanDigitalSkySurvey(SDSS)fromChicagotolocationsintheU.S.,EuropeandAsia.Thispaperisbasedinparton[2]

2KeyConcepts

IntheDataSpaceapproachtodataintegration,weassumethatdataisdistributed,hasattachedmeta-data,andisassociatedwithgloballyuniqueidenti erscalleduniversalkeys.

Data.DataSpaceisdesignedtoworkwithseveraldi erenttypesofdata,includingrelationaldata,distributedcolumnsofdata,semi-structureddataandblobsofdata.Werefertothecomponentsofdataaselements.Forexample,elementsofrelationaldataare elds,elementsofcolumnsofdataarecolumns,elementsofsemi-structuredXMLdataareXMLelements,etc.

Metadata.Weassumethatthereisdescriptiveinformationaboutdatasets,andcomponentsofdatasets,suchascolumnsorelements.Examplesofmetadatacanincludedatatypes,associatedschemasandtaxonomies,andprovenanceinformation.

Universalkeys.Weassumethatdata,metadata,andtheircomponentsmayhave,butarenotrequiredtohave,globallyuniqueidenti ers(GUIDs)associatedwiththem.WerefertotheseGUIDsasuniversalkeys.Universalkeysareusedtode netheintegrationofdata.

Deriveddata.DataSpaceallowsnewdatatobederivedfromexistingdataintwoways,asdescribedbelow.

DataSpacesupportsthefollowingoperationsondata:

2006,Halevy,FranklinandMaier[8]introducedanapproachtodataintegrationalsocalleddataspacethatalsodoesnotrequirethefullsemanticintegrationofinformation.Incontrast,thedataspacethatwasintroducedin2002in[3]islessambitiousanddoesnotrequiresupportforconstraints,consistency,orrecovery.1In

Data Integration Via Universal Keys相关文档

最新文档

返回顶部