SNAP Library 4.0, User Reference  2017-07-27 13:18:06
SNAP, a general purpose, high performance system for analysis and manipulation of large networks
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
TTable Class Reference

Table class: Relational table with columnar data storage. More...

#include <table.h>

Classes

class  TLoadVecInit
 

Public Member Functions

void AddIntCol (const TStr &ColName)
 Adds an integer column with name ColName. More...
 
void AddFltCol (const TStr &ColName)
 Adds a float column with name ColName. More...
 
void AddStrCol (const TStr &ColName)
 Adds a string column with name ColName. More...
 
void GroupByIntColMP (const TStr &GroupBy, THashMP< TInt, TIntV > &Grouping, TBool UsePhysicalIds=true) const
 Groups/hashes by a single column with integer values, using OpenMP multi-threading. More...
 
 TTable ()
 
 TTable (TTableContext *Context)
 
 TTable (const Schema &S, TTableContext *Context)
 
 TTable (TSIn &SIn, TTableContext *Context)
 
 TTable (const THash< TInt, TInt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Constructor to build table out of a hash table of int->int. More...
 
 TTable (const THash< TInt, TFlt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Constructor to build table out of a hash table of int->float. More...
 
 TTable (const TTable &Table)
 Copy constructor. More...
 
 TTable (const TTable &Table, const TIntV &RowIds)
 
void SaveSS (const TStr &OutFNm)
 Saves table schema and content to a TSV file. More...
 
void SaveBin (const TStr &OutFNm)
 Saves table schema and content to a binary file. More...
 
void Save (TSOut &SOut)
 Saves table schema and content to a binary format. More...
 
void Dump (FILE *OutF=stdout) const
 Prints table contents to a text file. More...
 
void AddRow (const TTableRow &Row)
 Adds row with values taken from given TTableRow. More...
 
TTableContextGetContext ()
 Returns the context. More...
 
TTableContextChangeContext (TTableContext *Context)
 Changes the current context. Moves all object items to the new context. More...
 
TInt GetColIdx (const TStr &ColName) const
 Gets index of column ColName among columns of the same type in the schema. More...
 
TInt GetIntVal (const TStr &ColName, const TInt &RowIdx)
 Gets the value of integer attribute ColName at row RowIdx. More...
 
TFlt GetFltVal (const TStr &ColName, const TInt &RowIdx)
 Gets the value of float attribute ColName at row RowIdx. More...
 
TStr GetStrVal (const TStr &ColName, const TInt &RowIdx) const
 Gets the value of string attribute ColName at row RowIdx. More...
 
TInt GetStrMapById (TInt ColIdx, TInt RowIdx) const
 Gets the integer mapping of the string at column ColIdx at row RowIdx. More...
 
TInt GetStrMapByName (const TStr &ColName, TInt RowIdx) const
 Gets the integer mapping of the string at column ColName at row RowIdx. More...
 
TStr GetStrValById (TInt ColIdx, TInt RowIdx) const
 Gets the value of the string attribute at column ColIdx at row RowIdx. More...
 
TStr GetStrValByName (const TStr &ColName, const TInt &RowIdx) const
 Gets the value of the string attribute at column ColName at row RowIdx. More...
 
TIntV GetIntRowIdxByVal (const TStr &ColName, const TInt &Val) const
 Gets the rows containing Val in int column ColName. More...
 
TIntV GetStrRowIdxByMap (const TStr &ColName, const TInt &Map) const
 Gets the rows containing int mapping Map in str column ColName. More...
 
TIntV GetFltRowIdxByVal (const TStr &ColName, const TFlt &Val) const
 Gets the rows containing Val in flt column ColName. More...
 
TInt RequestIndexInt (const TStr &ColName)
 Creates Index for Int Column ColName. More...
 
TInt RequestIndexFlt (const TStr &ColName)
 Creates Index for Flt Column ColName. More...
 
TInt RequestIndexStrMap (const TStr &ColName)
 Creates Index for Str Column ColName. More...
 
TStr GetStr (const TInt &KeyId) const
 Gets the string with KeyId. More...
 
TInt GetIntValAtRowIdx (const TInt &ColIdx, const TInt &RowIdx)
 Get the integer value at column ColIdx and row RowIdx. More...
 
TFlt GetFltValAtRowIdx (const TInt &ColIdx, const TInt &RowIdx)
 Get the float value at column ColIdx and row RowIdx. More...
 
Schema GetSchema ()
 Gets the schema of this table. More...
 
TVec< PNEANetToGraphSequence (TStr SplitAttr, TAttrAggr AggrPolicy, TInt WindowSize, TInt JumpSize, TInt StartVal=TInt::Mn, TInt EndVal=TInt::Mx)
 Creates a sequence of graphs based on values of column SplitAttr and windows specified by JumpSize and WindowSize. More...
 
TVec< PNEANetToVarGraphSequence (TStr SplitAttr, TAttrAggr AggrPolicy, TIntPrV SplitIntervals)
 Creates a sequence of graphs based on values of column SplitAttr and intervals specified by SplitIntervals. More...
 
TVec< PNEANetToGraphPerGroup (TStr GroupAttr, TAttrAggr AggrPolicy)
 Creates a sequence of graphs based on grouping specified by GroupAttr. More...
 
PNEANet ToGraphSequenceIterator (TStr SplitAttr, TAttrAggr AggrPolicy, TInt WindowSize, TInt JumpSize, TInt StartVal=TInt::Mn, TInt EndVal=TInt::Mx)
 Creates the graph sequence one at a time. More...
 
PNEANet ToVarGraphSequenceIterator (TStr SplitAttr, TAttrAggr AggrPolicy, TIntPrV SplitIntervals)
 Creates the graph sequence one at a time. More...
 
PNEANet ToGraphPerGroupIterator (TStr GroupAttr, TAttrAggr AggrPolicy)
 Creates the graph sequence one at a time. More...
 
PNEANet NextGraphIterator ()
 Calls to this must be preceded by a call to one of the above ToGraph*Iterator functions. More...
 
TBool IsLastGraphOfSequence ()
 Checks if the end of the graph sequence is reached. More...
 
TStr GetSrcCol () const
 Gets the name of the column to be used as src nodes in the graph. More...
 
void SetSrcCol (const TStr &Src)
 Sets the name of the column to be used as src nodes in the graph. More...
 
TStr GetDstCol () const
 Gets the name of the column to be used as dst nodes in the graph. More...
 
void SetDstCol (const TStr &Dst)
 Sets the name of the column to be used as dst nodes in the graph. More...
 
void AddEdgeAttr (const TStr &Attr)
 Adds column to be used as graph edge attribute. More...
 
void AddEdgeAttr (TStrV &Attrs)
 Adds columns to be used as graph edge attributes. More...
 
void AddSrcNodeAttr (const TStr &Attr)
 Adds column to be used as src node atribute of the graph. More...
 
void AddSrcNodeAttr (TStrV &Attrs)
 Adds columns to be used as src node attributes of the graph. More...
 
void AddDstNodeAttr (const TStr &Attr)
 Adds column to be used as dst node atribute of the graph. More...
 
void AddDstNodeAttr (TStrV &Attrs)
 Adds columns to be used as dst node attributes of the graph. More...
 
void AddNodeAttr (const TStr &Attr)
 Handles the common case where src and dst both belong to the same "universe" of entities. More...
 
void AddNodeAttr (TStrV &Attrs)
 Handles the common case where src and dst both belong to the same "universe" of entities. More...
 
void SetCommonNodeAttrs (const TStr &SrcAttr, const TStr &DstAttr, const TStr &CommonAttrName)
 Sets the columns to be used as both src and dst node attributes. More...
 
TStrV GetSrcNodeIntAttrV () const
 Gets src node int attribute name vector. More...
 
TStrV GetDstNodeIntAttrV () const
 Gets dst node int attribute name vector. More...
 
TStrV GetEdgeIntAttrV () const
 Gets edge int attribute name vector. More...
 
TStrV GetSrcNodeFltAttrV () const
 Gets src node float attribute name vector. More...
 
TStrV GetDstNodeFltAttrV () const
 Gets dst node float attribute name vector. More...
 
TStrV GetEdgeFltAttrV () const
 Gets edge float attribute name vector. More...
 
TStrV GetSrcNodeStrAttrV () const
 Gets src node str attribute name vector. More...
 
TStrV GetDstNodeStrAttrV () const
 Gets dst node str attribute name vector. More...
 
TStrV GetEdgeStrAttrV () const
 Gets edge str attribute name vector. More...
 
TAttrType GetColType (const TStr &ColName) const
 Gets type of column ColName. More...
 
TInt GetNumRows () const
 Gets total number of rows in this table. More...
 
TInt GetNumValidRows () const
 Gets number of valid, i.e. not deleted, rows in this table. More...
 
THash< TInt, TIntGetRowIdMap () const
 Gets a map of logical to physical row ids. More...
 
TRowIterator BegRI () const
 Gets iterator to the first valid row of the table. More...
 
TRowIterator EndRI () const
 Gets iterator to the last valid row of the table. More...
 
TRowIteratorWithRemove BegRIWR ()
 Gets iterator with reomve to the first valid row. More...
 
TRowIteratorWithRemove EndRIWR ()
 Gets iterator with reomve to the last valid row. More...
 
void GetPartitionRanges (TIntPrV &Partitions, TInt NumPartitions) const
 Partitions the table into NumPartitions and populate Partitions with the ranges. More...
 
void Rename (const TStr &Column, const TStr &NewLabel)
 Renames a column. More...
 
void Unique (const TStr &Col)
 Removes rows with duplicate values in given column. More...
 
void Unique (const TStrV &Cols, TBool Ordered=true)
 Removes rows with duplicate values in given columns. More...
 
void Select (TPredicate &Predicate, TIntV &SelectedRows, TBool Remove=true)
 Selects rows that satisfy given Predicate. More...
 
void Select (TPredicate &Predicate)
 
void Classify (TPredicate &Predicate, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
 
void SelectAtomic (const TStr &Col1, const TStr &Col2, TPredComp Cmp, TIntV &SelectedRows, TBool Remove=true)
 Selects rows using atomic compare operation. More...
 
void SelectAtomic (const TStr &Col1, const TStr &Col2, TPredComp Cmp)
 
void ClassifyAtomic (const TStr &Col1, const TStr &Col2, TPredComp Cmp, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
 
void SelectAtomicConst (const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
 Selects rows where the value of Col matches given primitive Val. More...
 
template<class T >
void SelectAtomicConst (const TStr &Col, const T &Val, TPredComp Cmp)
 
template<class T >
void SelectAtomicConst (const TStr &Col, const T &Val, TPredComp Cmp, PTable &SelectedTable)
 
template<class T >
void ClassifyAtomicConst (const TStr &Col, const T &Val, TPredComp Cmp, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
 
void SelectAtomicIntConst (const TStr &Col, const TInt &Val, TPredComp Cmp)
 
void SelectAtomicIntConst (const TStr &Col, const TInt &Val, TPredComp Cmp, PTable &SelectedTable)
 
void SelectAtomicStrConst (const TStr &Col, const TStr &Val, TPredComp Cmp)
 
void SelectAtomicStrConst (const TStr &Col, const TStr &Val, TPredComp Cmp, PTable &SelectedTable)
 
void SelectAtomicFltConst (const TStr &Col, const TFlt &Val, TPredComp Cmp)
 
void SelectAtomicFltConst (const TStr &Col, const TFlt &Val, TPredComp Cmp, PTable &SelectedTable)
 
void Group (const TStrV &GroupBy, const TStr &GroupColName, TBool Ordered=true, TBool UsePhysicalIds=true)
 Groups rows depending on values of GroupBy columns. More...
 
void Count (const TStr &CountColName, const TStr &Col)
 Counts number of unique elements. More...
 
void Order (const TStrV &OrderBy, TStr OrderColName="", TBool ResetRankByMSC=false, TBool Asc=true)
 Orders the rows according to the values in columns of OrderBy (in descending lexicographic order). More...
 
void Aggregate (const TStrV &GroupByAttrs, TAttrAggr AggOp, const TStr &ValAttr, const TStr &ResAttr, TBool Ordered=true)
 Aggregates values of ValAttr after grouping with respect to GroupByAttrs. Result are stored as new attribute ResAttr. More...
 
void AggregateCols (const TStrV &AggrAttrs, TAttrAggr AggOp, const TStr &ResAttr)
 Aggregates attributes in AggrAttrs across columns. More...
 
TVec< PTableSpliceByGroup (const TStrV &GroupByAttrs, TBool Ordered=true)
 Splices table into subtables according to a grouping statement. More...
 
PTable Join (const TStr &Col1, const TTable &Table, const TStr &Col2)
 Performs equijoin. More...
 
PTable Join (const TStr &Col1, const PTable &Table, const TStr &Col2)
 
PTable ThresholdJoin (const TStr &KeyCol1, const TStr &JoinCol1, const TTable &Table, const TStr &KeyCol2, const TStr &JoinCol2, TInt Threshold, TBool PerJoinKey=false)
 
PTable SelfJoin (const TStr &Col)
 Joins table with itself, on values of Col. More...
 
PTable SelfSimJoin (const TStrV &Cols, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
 
PTable SelfSimJoinPerGroup (const TStr &GroupAttr, const TStr &SimCol, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
 Performs join if the distance between two rows is less than the specified threshold. More...
 
PTable SelfSimJoinPerGroup (const TStrV &GroupBy, const TStr &SimCol, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
 Performs join if the distance between two rows is less than the specified threshold. More...
 
PTable SimJoin (const TStrV &Cols1, const TTable &Table, const TStrV &Cols2, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
 Performs join if the distance between two rows is less than the specified threshold. More...
 
void SelectFirstNRows (const TInt &N)
 Selects first N rows from the table. More...
 
void Defrag ()
 Releases memory of deleted rows, and defrags. More...
 
void StoreIntCol (const TStr &ColName, const TIntV &ColVals)
 Adds entire int column to table. More...
 
void StoreFltCol (const TStr &ColName, const TFltV &ColVals)
 Adds entire flt column to table. More...
 
void StoreStrCol (const TStr &ColName, const TStrV &ColVals)
 Adds entire str column to table. More...
 
void UpdateFltFromTable (const TStr &KeyAttr, const TStr &UpdateAttr, const TTable &Table, const TStr &FKeyAttr, const TStr &ReadAttr, TFlt DefaultFltVal=0.0)
 
void UpdateFltFromTableMP (const TStr &KeyAttr, const TStr &UpdateAttr, const TTable &Table, const TStr &FKeyAttr, const TStr &ReadAttr, TFlt DefaultFltVal=0.0)
 
void SetFltColToConstMP (TInt UpdateColIdx, TFlt DefaultFltVal)
 
PTable Union (const TTable &Table)
 Returns union of this table with given Table. More...
 
PTable Union (const PTable &Table)
 
PTable UnionAll (const TTable &Table)
 Returns union of this table with given Table, preserving duplicates. More...
 
PTable UnionAll (const PTable &Table)
 
void UnionAllInPlace (const TTable &Table)
 Same as TTable::ConcatTable. More...
 
void UnionAllInPlace (const PTable &Table)
 
PTable Intersection (const TTable &Table)
 Returns intersection of this table with given Table. More...
 
PTable Intersection (const PTable &Table)
 
PTable Minus (TTable &Table)
 Returns table with rows that are present in this table but not in given Table. More...
 
PTable Minus (const PTable &Table)
 
PTable Project (const TStrV &ProjectCols)
 Returns table with only the columns in ProjectCols. More...
 
void ProjectInPlace (const TStrV &ProjectCols)
 Keeps only the columns specified in ProjectCols. More...
 
void ColGenericOp (const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
 Performs columnwise arithmetic operation. More...
 
void ColGenericOpMP (TInt ArgColIdx1, TInt ArgColIdx2, TAttrType ArgType1, TAttrType ArgType2, TInt ResColIdx, TArithOp op)
 
void ColAdd (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs columnwise addition. See TTable::ColGenericOp. More...
 
void ColSub (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs columnwise subtraction. See TTable::ColGenericOp. More...
 
void ColMul (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs columnwise multiplication. See TTable::ColGenericOp. More...
 
void ColDiv (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs columnwise division. See TTable::ColGenericOp. More...
 
void ColMod (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs columnwise modulus. See TTable::ColGenericOp. More...
 
void ColMin (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs min of two columns. See TTable::ColGenericOp. More...
 
void ColMax (const TStr &Attr1, const TStr &Attr2, const TStr &ResultAttrName="")
 Performs max of two columns. See TTable::ColGenericOp. More...
 
void ColGenericOp (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr, TArithOp op, TBool AddToFirstTable)
 Performs columnwise arithmetic operation with column of given table. More...
 
void ColAdd (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr="", TBool AddToFirstTable=true)
 Performs columnwise addition with column of given table. More...
 
void ColSub (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr="", TBool AddToFirstTable=true)
 Performs columnwise subtraction with column of given table. More...
 
void ColMul (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr="", TBool AddToFirstTable=true)
 Performs columnwise multiplication with column of given table. More...
 
void ColDiv (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr="", TBool AddToFirstTable=true)
 Performs columnwise division with column of given table. More...
 
void ColMod (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &ResAttr="", TBool AddToFirstTable=true)
 Performs columnwise modulus with column of given table. More...
 
void ColGenericOp (const TStr &Attr1, const TFlt &Num, const TStr &ResAttr, TArithOp op, const TBool floatCast)
 Performs arithmetic op of column values and given Num. More...
 
void ColGenericOpMP (const TInt &ColIdx1, const TInt &ColIdx2, TAttrType ArgType, const TFlt &Num, TArithOp op, TBool ShouldCast)
 
void ColAdd (const TStr &Attr1, const TFlt &Num, const TStr &ResultAttrName="", const TBool floatCast=false)
 Performs addition of column values and given Num. More...
 
void ColSub (const TStr &Attr1, const TFlt &Num, const TStr &ResultAttrName="", const TBool floatCast=false)
 Performs subtraction of column values and given Num. More...
 
void ColMul (const TStr &Attr1, const TFlt &Num, const TStr &ResultAttrName="", const TBool floatCast=false)
 Performs multiplication of column values and given Num. More...
 
void ColDiv (const TStr &Attr1, const TFlt &Num, const TStr &ResultAttrName="", const TBool floatCast=false)
 Performs division of column values and given Num. More...
 
void ColMod (const TStr &Attr1, const TFlt &Num, const TStr &ResultAttrName="", const TBool floatCast=false)
 Performs modulus of column values and given Num. More...
 
void ColConcat (const TStr &Attr1, const TStr &Attr2, const TStr &Sep="", const TStr &ResAttr="")
 Concatenates two string columns. More...
 
void ColConcat (const TStr &Attr1, TTable &Table, const TStr &Attr2, const TStr &Sep="", const TStr &ResAttr="", TBool AddToFirstTable=true)
 Concatenates string column with column of given table. More...
 
void ColConcatConst (const TStr &Attr1, const TStr &Val, const TStr &Sep="", const TStr &ResAttr="")
 Concatenates column values with given string value. More...
 
void ReadIntCol (const TStr &ColName, TIntV &Result) const
 Reads values of entire int column into Result. More...
 
void ReadFltCol (const TStr &ColName, TFltV &Result) const
 Reads values of entire float column into Result. More...
 
void ReadStrCol (const TStr &ColName, TStrV &Result) const
 Reads values of entire string column into Result. More...
 
void InitIds ()
 Adds explicit row ids, initialize hash set mapping ids to physical rows. More...
 
PTable IsNextK (const TStr &OrderCol, TInt K, const TStr &GroupBy, const TStr &RankColName="")
 Distance based filter. More...
 
void PrintSize ()
 
void PrintContextSize ()
 
TSize GetMemUsedKB ()
 Returns approximate memory used by table in [KB]. More...
 
TSize GetContextMemUsedKB ()
 Returns approximate memory used by table context in [KB]. More...
 

Static Public Member Functions

static void SetMP (TInt Value)
 
static TInt GetMP ()
 
static TStr NormalizeColName (const TStr &ColName)
 Adds suffix to column name if it doesn't exist. More...
 
static TStrV NormalizeColNameV (const TStrV &Cols)
 Adds suffix to column name if it doesn't exist. More...
 
static PTable New ()
 
static PTable New (TTableContext *Context)
 
static PTable New (const Schema &S, TTableContext *Context)
 
static PTable New (const THash< TInt, TInt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Returns pointer to a table constructed from given int->int hash. More...
 
static PTable New (const THash< TInt, TFlt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Returns pointer to a table constructed from given int->float hash. More...
 
static PTable New (const PTable Table)
 Returns pointer to a new table created from given Table. More...
 
static void GetSchema (const TStr &InFNm, Schema &S, const char &Separator= '\t')
 Returns pointer to a new table created from given Table, with name set to TableName. More...
 
static PTable LoadSS (const Schema &S, const TStr &InFNm, TTableContext *Context, const char &Separator= '\t', TBool HasTitleLine=false)
 Loads table from spread sheet (TSV, CSV, etc). Note: HasTitleLine = true is not supported. Please comment title lines instead. More...
 
static PTable LoadSS (const Schema &S, const TStr &InFNm, TTableContext *Context, const TIntV &RelevantCols, const char &Separator= '\t', TBool HasTitleLine=false)
 Loads table from spread sheet - but only load the columns specified by RelevantCols. Note: HasTitleLine = true is not supported. Please comment title lines instead. More...
 
static PTable Load (TSIn &SIn, TTableContext *Context)
 Loads table from a binary format. More...
 
static PTable LoadShM (TShMIn &ShMIn, TTableContext *Context)
 Static constructor to load table from memory. More...
 
static PTable TableFromHashMap (const THash< TInt, TInt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Builds table from hash table of int->int. More...
 
static PTable TableFromHashMap (const THash< TInt, TFlt > &H, const TStr &Col1, const TStr &Col2, TTableContext *Context, const TBool IsStrKeys=false)
 Builds table from hash table of int->float. More...
 
static PTable GetNodeTable (const PNEANet &Network, TTableContext *Context)
 Extracts node TTable from PNEANet. More...
 
static PTable GetEdgeTable (const PNEANet &Network, TTableContext *Context)
 Extracts edge TTable from PNEANet. More...
 
static PTable GetEdgeTablePN (const PNGraphMP &Network, TTableContext *Context)
 Extracts edge TTable from parallel graph PNGraphMP. More...
 
static PTable GetFltNodePropertyTable (const PNEANet &Network, const TIntFltH &Property, const TStr &NodeAttrName, const TAttrType &NodeAttrType, const TStr &PropertyAttrName, TTableContext *Context)
 Extracts node and edge property TTables from THash. More...
 

Protected Member Functions

void InvalidatePhysicalGroupings ()
 
void InvalidateAffectedGroupings (const TStr &Attr)
 
void IncrementNext ()
 Increments the next vector and set last, NumRows and NumValidRows. More...
 
void ClassifyAux (const TIntV &SelectedRows, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
 Adds a label attribute with positive labels on selected rows and negative labels on the rest. More...
 
const char * GetContextKey (TInt Val) const
 Gets the Key of the Context StringVals pool. Used by ToGraph method in conv.cpp. More...
 
TStr GetStrVal (TInt ColIdx, TInt RowIdx) const
 Gets the value in column with id ColIdx at row RowIdx. More...
 
void AddStrVal (const TInt &ColIdx, const TStr &Val)
 Adds Val in column with id ColIdx. More...
 
void AddStrVal (const TStr &Col, const TStr &Val)
 Adds Val in column with name Col. More...
 
TStr GetIdColName () const
 Gets name of the id column of this table. More...
 
TStr GetSchemaColName (TInt Idx) const
 Gets name of the column with index Idx in the schema. More...
 
TAttrType GetSchemaColType (TInt Idx) const
 Gets type of the column with index Idx in the schema. More...
 
void AddSchemaCol (const TStr &ColName, TAttrType ColType)
 Adds column with name ColName and type ColType to the schema. More...
 
TBool IsColName (const TStr &ColName) const
 
void AddColType (const TStr &ColName, TPair< TAttrType, TInt > ColType)
 Adds column with name ColName and type ColType to the ColTypeMap. More...
 
void AddColType (const TStr &ColName, TAttrType ColType, TInt Index)
 Adds column with name ColName and type ColType to the ColTypeMap. More...
 
void DelColType (const TStr &ColName)
 Adds column with name ColName and type ColType to the ColTypeMap. More...
 
TPair< TAttrType, TIntGetColTypeMap (const TStr &ColName) const
 Gets column type and index of ColName. More...
 
TStr RenumberColName (const TStr &ColName) const
 Returns a re-numbered column name based on number of existing columns with conflicting names. More...
 
TStr DenormalizeColName (const TStr &ColName) const
 Removes suffix to column name if exists. More...
 
Schema DenormalizeSchema () const
 Removes suffix to column names in the Schema. More...
 
TBool IsAttr (const TStr &Attr)
 Checks if Attr is an attribute of this table schema. More...
 
void AddTable (const TTable &T)
 Adds all the rows of the input table. Allows duplicate rows (not a union). More...
 
void ConcatTable (const PTable &T)
 Appends all rows of T to this table, and recalculate indices. More...
 
void AddRow (const TRowIterator &RI)
 Adds row corresponding to RI. More...
 
void AddRow (const TIntV &IntVals, const TFltV &FltVals, const TStrV &StrVals)
 Adds row with values corresponding to the given vectors by type. More...
 
void AddGraphAttribute (const TStr &Attr, TBool IsEdge, TBool IsSrc, TBool IsDst)
 Adds names of columns to be used as graph attributes. More...
 
void AddGraphAttributeV (TStrV &Attrs, TBool IsEdge, TBool IsSrc, TBool IsDst)
 Adds vector of names of columns to be used as graph attributes. More...
 
void CheckAndAddIntNode (PNEANet Graph, THashSet< TInt > &NodeVals, TInt NodeId)
 Checks if given NodeId is seen earlier; if not, add it to Graph and hashmap NodeVals. More...
 
template<class T >
TInt CheckAndAddFltNode (T Graph, THash< TFlt, TInt > &NodeVals, TFlt FNodeVal)
 Checks if given NodeVal is seen earlier; if not, add it to Graph and hashmap NodeVals. More...
 
void AddEdgeAttributes (PNEANet &Graph, int RowId)
 Adds attributes of edge corresponding to RowId to the Graph. More...
 
void AddNodeAttributes (TInt NId, TStrV NodeAttrV, TInt RowId, THash< TInt, TStrIntVH > &NodeIntAttrs, THash< TInt, TStrFltVH > &NodeFltAttrs, THash< TInt, TStrStrVH > &NodeStrAttrs)
 Takes as parameters, and updates, maps NodeXAttrs: Node Id –> (attribute name –> Vector of attribute values). More...
 
PNEANet BuildGraph (const TIntV &RowIds, TAttrAggr AggrPolicy)
 Makes a single pass over the rows in the given row id set, and creates nodes, edges, assigns node and edge attributes. More...
 
void InitRowIdBuckets (int NumBuckets)
 Initializes the RowIdBuckets vector which will be used for the graph sequence creation. More...
 
void FillBucketsByWindow (TStr SplitAttr, TInt JumpSize, TInt WindowSize, TInt StartVal, TInt EndVal)
 Fills RowIdBuckets with sets of row ids. More...
 
void FillBucketsByInterval (TStr SplitAttr, TIntPrV SplitIntervals)
 Fills RowIdBuckets with sets of row ids. More...
 
TVec< PNEANetGetGraphsFromSequence (TAttrAggr AggrPolicy)
 Returns a sequence of graphs. More...
 
PNEANet GetFirstGraphFromSequence (TAttrAggr AggrPolicy)
 Returns the first graph of the sequence. More...
 
PNEANet GetNextGraphFromSequence ()
 Returns the next graph in sequence corresponding to RowIdBuckets. More...
 
template<class T >
AggregateVector (TVec< T > &V, TAttrAggr Policy)
 Aggregates vector into a single scalar value according to a policy. More...
 
void GroupingSanityCheck (const TStr &GroupBy, const TAttrType &AttrType) const
 Checks if grouping key exists and matches given attr type. More...
 
template<class T >
void GroupByIntCol (const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
 Groups/hashes by a single column with integer values. More...
 
template<class T >
void GroupByFltCol (const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
 Groups/hashes by a single column with float values. Returns hash table with grouping. More...
 
template<class T >
void GroupByStrCol (const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
 Groups/hashes by a single column with string values. Returns hash table with grouping. More...
 
template<class T >
void UpdateGrouping (THash< T, TIntV > &Grouping, T Key, TInt Val) const
 Template for utility function to update a grouping hash map. More...
 
template<class T >
void UpdateGrouping (THashMP< T, TIntV > &Grouping, T Key, TInt Val) const
 Template for utility function to update a parallel grouping hash map. More...
 
void PrintGrouping (const THash< TGroupKey, TIntV > &Grouping) const
 
TInt CompareRows (TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
 Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strcmp semantics). More...
 
TInt CompareRows (TInt R1, TInt R2, const TVec< TAttrType > &CompareByTypes, const TIntV &CompareByIndices, TBool Asc=true)
 Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strcmp semantics). More...
 
TInt GetPivot (TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc)
 Gets pivot element for QSort. More...
 
TInt Partition (TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc)
 Partitions vector for QSort. More...
 
void ISort (TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
 Performs insertion sort on given vector V. More...
 
void QSort (TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
 Performs QSort on given vector V. More...
 
void Merge (TIntV &V, TInt Idx1, TInt Idx2, TInt Idx3, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
 Helper function for parallel QSort. More...
 
void QSortPar (TIntV &V, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
 Performs QSort in parallel on given vector V. More...
 
bool IsRowValid (TInt RowIdx) const
 Checks if RowIdx corresponds to a valid (i.e. not deleted) row. More...
 
TInt GetLastValidRowIdx ()
 Gets the id of the last valid row of the table. More...
 
void RemoveFirstRow ()
 Removes first valid row of the table. More...
 
void RemoveRow (TInt RowIdx, TInt PrevRowIdx)
 Removes row with id RowIdx. More...
 
void KeepSortedRows (const TIntV &KeepV)
 Removes all rows that are not mentioned in the SORTED vector KeepV. More...
 
void SetFirstValidRow ()
 Sets the first valid row of the TTable. More...
 
PTable InitializeJointTable (const TTable &Table)
 Initializes an empty table for the join of this table with the given table. More...
 
void AddJointRow (const TTable &T1, const TTable &T2, TInt RowIdx1, TInt RowIdx2)
 Adds joint row T1[RowIdx1]<=>T2[RowIdx2]. More...
 
void ThresholdJoinInputCorrectness (const TStr &KeyCol1, const TStr &JoinCol1, const TTable &Table, const TStr &KeyCol2, const TStr &JoinCol2)
 
void ThresholdJoinCountCollisions (const TTable &TB, const TTable &TS, const TIntIntVH &T, TInt JoinColIdxB, TInt KeyColIdxB, TInt KeyColIdxS, THash< TIntPr, TIntTr > &Counters, TBool ThisIsSmaller, TAttrType JoinColType, TAttrType KeyType)
 
PTable ThresholdJoinOutputTable (const THash< TIntPr, TIntTr > &Counters, TInt Threshold, const TTable &Table)
 
void ThresholdJoinCountPerJoinKeyCollisions (const TTable &TB, const TTable &TS, const TIntIntVH &T, TInt JoinColIdxB, TInt KeyColIdxB, TInt KeyColIdxS, THash< TIntTr, TIntTr > &Counters, TBool ThisIsSmaller, TAttrType JoinColType, TAttrType KeyType)
 
PTable ThresholdJoinPerJoinKeyOutputTable (const THash< TIntTr, TIntTr > &Counters, TInt Threshold, const TTable &Table)
 
void ResizeTable (int RowCount)
 Resizes the table to hold RowCount rows. More...
 
int GetEmptyRowsStart (int NewRows)
 Gets the start index to a chunk of empty rows of size NewRows. More...
 
void AddSelectedRows (const TTable &Table, const TIntV &RowIDs)
 Adds rows from Table that correspond to ids in RowIDs. More...
 
void AddNRows (int NewRows, const TVec< TIntV > &IntColsP, const TVec< TFltV > &FltColsP, const TVec< TIntV > &StrColMapsP)
 Adds NewRows rows from the given vectors for each column type. More...
 
void AddNJointRowsMP (const TTable &T1, const TTable &T2, const TVec< TIntPrV > &JointRowIDSet)
 Adds rows from T1 and T2 to this table in a parallel manner. Used by Join. More...
 
void UpdateTableForNewRow ()
 Updates table state after adding one or more rows. More...
 
void GroupAux (const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
 Helper function for grouping. More...
 
void StoreGroupCol (const TStr &GroupColName, const TVec< TPair< TInt, TInt > > &GroupAndRowIds)
 Parallel helper function for grouping. - we currently don't support such parallel grouping by complex keys. More...
 
void Reindex ()
 Reinitializes row ids. More...
 
void AddIdColumn (const TStr &IdColName)
 Adds a column of explicit integer identifiers to the rows. More...
 
void GetCollidingRows (const TTable &T, THashSet< TInt > &Collisions)
 Gets set of row ids of rows common with table T. More...
 

Static Protected Member Functions

static void LoadSSPar (PTable &NewTable, const Schema &S, const TStr &InFNm, const TIntV &RelevantCols, const char &Separator, TBool HasTitleLine)
 Parallelly loads data from input file at InFNm into NewTable. Only work when NewTable has no string columns. More...
 
static void LoadSSSeq (PTable &NewTable, const Schema &S, const TStr &InFNm, const TIntV &RelevantCols, const char &Separator, TBool HasTitleLine)
 Sequentially loads data from input file at InFNm into NewTable. More...
 
static TInt CompareKeyVal (const TInt &K1, const TInt &V1, const TInt &K2, const TInt &V2)
 
static TInt CheckSortedKeyVal (TIntV &Key, TIntV &Val, TInt Start, TInt End)
 
static void ISortKeyVal (TIntV &Key, TIntV &Val, TInt Start, TInt End)
 
static TInt GetPivotKeyVal (TIntV &Key, TIntV &Val, TInt Start, TInt End)
 
static TInt PartitionKeyVal (TIntV &Key, TIntV &Val, TInt Start, TInt End)
 
static void QSortKeyVal (TIntV &Key, TIntV &Val, TInt Start, TInt End)
 

Protected Attributes

TTableContextContext
 Execution Context. More...
 
Schema Sch
 Table Schema. More...
 
TCRef CRef
 
TInt NumRows
 Number of rows in the table (valid and invalid). More...
 
TInt NumValidRows
 Number of valid rows in the table (i.e. rows that were not logically removed). More...
 
TInt FirstValidRow
 Physical index of first valid row. More...
 
TInt LastValidRow
 Physical index of last valid row. More...
 
TIntV Next
 A vector describing the logical order of the rows. More...
 
TVec< TIntVIntCols
 Next[i] is the successor of row i. Table iterators follow the order dictated by Next More...
 
TVec< TFltVFltCols
 Data columns of floating point attributes. More...
 
TVec< TIntVStrColMaps
 Data columns of integer mappings of string attributes. More...
 
THash< TStr, TPair< TAttrType,
TInt > > 
ColTypeMap
 
TStr IdColName
 A mapping from column name to column type and column index among columns of the same type. More...
 
TIntIntH RowIdMap
 Mapping of permanent row ids to physical id. More...
 
THash< TStr, THash< TInt, TIntV > > IntColIndexes
 Indexes for Int Columns. More...
 
THash< TStr, THash< TInt, TIntV > > StrMapColIndexes
 Indexes for String Columns. More...
 
THash< TStr, THash< TFlt, TIntV > > FltColIndexes
 Indexes for Float Columns. More...
 
THash< TStr, GroupStmtGroupStmtNames
 Maps user-given grouping statement names to their group-by attributes. More...
 
THash< GroupStmt, THash< TInt,
TGroupKey > > 
GroupIDMapping
 Maps grouping statements to their (group id –> group-by key) mapping. More...
 
THash< GroupStmt, THash
< TGroupKey, TIntV > > 
GroupMapping
 Maps grouping statements to their (group-by key –> group id) mapping. More...
 
TStr SrcCol
 Column (attribute) to serve as src nodes when constructing the graph. More...
 
TStr DstCol
 Column (attribute) to serve as dst nodes when constructing the graph. More...
 
TStrV EdgeAttrV
 List of columns (attributes) to serve as edge attributes. More...
 
TStrV SrcNodeAttrV
 List of columns (attributes) to serve as source node attributes. More...
 
TStrV DstNodeAttrV
 List of columns (attributes) to serve as destination node attributes. More...
 
TStrTrV CommonNodeAttrs
 List of attribute pairs with values common to source and destination and their common given name. More...
 
TVec< TIntVRowIdBuckets
 Partitioning of row ids into buckets corresponding to different graph objects when generating a sequence of graphs. More...
 
TInt CurrBucket
 Current row id bucket - used when generating a sequence of graphs using an iterator. More...
 
TAttrAggr AggrPolicy
 Aggregation policy used for solving conflicts between different values of an attribute of the same node. More...
 
TInt IsNextDirty
 Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing GetPartitionRanges. More...
 

Static Protected Attributes

static const TInt Last = -1
 Special value for Next vector entry - last row in table. More...
 
static const TInt Invalid = -2
 Special value for Next vector entry - logically removed row. More...
 
static TInt UseMP = 1
 Global switch for choosing multi-threaded versions of TTable functions. More...
 

Private Member Functions

void GenerateColTypeMap (THash< TStr, TPair< TInt, TInt > > &ColTypeIntMap)
 
void LoadTableShM (TShMIn &ShMIn, TTableContext *ContextTable)
 

Friends

class TPt< TTable >
 
class TRowIterator
 
class TRowIteratorWithRemove
 
template<class PGraph >
PGraph TSnap::ToGraph (PTable Table, const TStr &SrcCol, const TStr &DstCol, TAttrAggr AggrPolicy)
 
template<class PGraph >
PGraph TSnap::ToNetwork (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &SrcAttrs, TStrV &DstAttrs, TStrV &EdgeAttrs, TAttrAggr AggrPolicy)
 
template<class PGraph >
PGraph TSnap::ToNetwork (PTable Table, const TStr &SrcCol, const TStr &DstCol, TAttrAggr AggrPolicy)
 
template<class PGraph >
PGraph TSnap::ToNetwork (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &EdgeAttrV, TAttrAggr AggrPolicy)
 
template<class PGraph >
PGraph TSnap::ToNetwork (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &EdgeAttrV, PTable NodeTable, const TStr &NodeCol, TStrV &NodeAttrV, TAttrAggr AggrPolicy)
 
int TSnap::LoadCrossNet (TCrossNet &Graph, PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &EdgeAttrV)
 
int TSnap::LoadMode (TModeNet &Graph, PTable Table, const TStr &NCol, TStrV &NodeAttrV)
 
template<class PGraphMP >
PGraphMP TSnap::ToGraphMP (PTable Table, const TStr &SrcCol, const TStr &DstCol)
 
template<class PGraphMP >
PGraphMP TSnap::ToGraphMP3 (PTable Table, const TStr &SrcCol, const TStr &DstCol)
 
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &SrcAttrs, TStrV &DstAttrs, TStrV &EdgeAttrs, TAttrAggr AggrPolicy)
 
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP2 (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &SrcAttrs, TStrV &DstAttrs, TStrV &EdgeAttrs, TAttrAggr AggrPolicy)
 
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &EdgeAttrV, TAttrAggr AggrPolicy)
 
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP (PTable Table, const TStr &SrcCol, const TStr &DstCol, TAttrAggr AggrPolicy)
 
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP (PTable Table, const TStr &SrcCol, const TStr &DstCol, TStrV &EdgeAttrV, PTable NodeTable, const TStr &NodeCol, TStrV &NodeAttrV, TAttrAggr AggrPolicy)
 

Detailed Description

Table class: Relational table with columnar data storage.

Definition at line 484 of file table.h.

Constructor & Destructor Documentation

TTable::TTable ( )

Definition at line 302 of file table.cpp.

302  : Context(new TTableContext), NumRows(0), NumValidRows(0),
303  FirstValidRow(0), LastValidRow(-1) {}
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
Execution context.
Definition: table.h:180
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TTable::TTable ( TTableContext Context)

Definition at line 305 of file table.cpp.

305  : Context(Context), NumRows(0),
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TTable::TTable ( const Schema S,
TTableContext Context 
)

Definition at line 308 of file table.cpp.

308  : Context(Context),
310  TInt IntColCnt = 0;
311  TInt FltColCnt = 0;
312  TInt StrColCnt = 0;
313  for (TInt i = 0; i < TableSchema.Len(); i++) {
314  TStr ColName = TableSchema[i].Val1;
315  TAttrType ColType = TableSchema[i].Val2;
316  AddSchemaCol(ColName, ColType);
317  switch (ColType) {
318  case atInt:
319  AddColType(ColName, atInt, IntColCnt);
320  IntColCnt++;
321  break;
322  case atFlt:
323  AddColType(ColName, atFlt, FltColCnt);
324  FltColCnt++;
325  break;
326  case atStr:
327  AddColType(ColName, atStr, StrColCnt);
328  StrColCnt++;
329  break;
330  }
331  }
332  IntCols = TVec<TIntV>(IntColCnt);
333  FltCols = TVec<TFltV>(FltColCnt);
334  StrColMaps = TVec<TIntV>(StrColCnt);
335 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
Definition: gbase.h:23
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
TTable::TTable ( TSIn SIn,
TTableContext Context 
)

Definition at line 378 of file table.cpp.

378  : Context(Context), NumRows(SIn),
379  NumValidRows(SIn), FirstValidRow(SIn), LastValidRow(SIn), Next(SIn), IntCols(SIn),
380  FltCols(SIn), StrColMaps(SIn) {
381  THash<TStr,TPair<TInt,TInt> > ColTypeIntMap(SIn);
382  GenerateColTypeMap(ColTypeIntMap);
383 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
void GenerateColTypeMap(THash< TStr, TPair< TInt, TInt > > &ColTypeIntMap)
Definition: table.cpp:337
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
Definition: hash.h:97
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TTable::TTable ( const THash< TInt, TInt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)

Constructor to build table out of a hash table of int->int.

Definition at line 385 of file table.cpp.

386  : Context(Context), NumRows(H.Len()),
387  NumValidRows(H.Len()), FirstValidRow(0), LastValidRow(H.Len()-1) {
388  TAttrType KeyType = IsStrKeys ? atStr : atInt;
389  AddSchemaCol(Col1, KeyType);
390  AddSchemaCol(Col2, atInt);
391  AddColType(Col1, KeyType, 0);
392  AddColType(Col2, atInt, 1);
393  if (IsStrKeys) {
394  StrColMaps = TVec<TIntV>(1);
395  IntCols = TVec<TIntV>(1);
396  H.GetKeyV(StrColMaps[0]);
397  H.GetDatV(IntCols[0]);
398  } else {
399  IntCols = TVec<TIntV>(2);
400  H.GetKeyV(IntCols[0]);
401  H.GetDatV(IntCols[1]);
402  }
403  Next = TIntV(NumRows);
404  for (TInt i = 0; i < NumRows; i++) {
405  Next[i] = i+1;
406  }
407  Next[NumRows-1] = Last;
408  IsNextDirty = 0;
409  InitIds();
410 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
void GetDatV(TVec< TDat > &DatV) const
Definition: hash.h:492
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
void InitIds()
Adds explicit row ids, initialize hash set mapping ids to physical rows.
Definition: table.cpp:1883
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void GetKeyV(TVec< TKey > &KeyV) const
Definition: hash.h:484
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
int Len() const
Definition: hash.h:228
TTable::TTable ( const THash< TInt, TFlt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)

Constructor to build table out of a hash table of int->float.

Definition at line 412 of file table.cpp.

413  : Context(Context),
414  NumRows(H.Len()), NumValidRows(H.Len()), FirstValidRow(0), LastValidRow(H.Len()-1) {
415  TAttrType KeyType = IsStrKeys ? atStr : atInt;
416  AddSchemaCol(Col1, KeyType);
417  AddSchemaCol(Col2, atFlt);
418  AddColType(Col1, KeyType, 0);
419  AddColType(Col2, atFlt, 0);
420  if (IsStrKeys) {
421  StrColMaps = TVec<TIntV>(1);
422  H.GetKeyV(StrColMaps[0]);
423  } else {
424  IntCols = TVec<TIntV>(1);
425  H.GetKeyV(IntCols[0]);
426  }
427  FltCols = TVec<TFltV>(1);
428  H.GetDatV(FltCols[0]);
429  Next = TIntV(NumRows);
430  for (TInt i = 0; i < NumRows; i++) {
431  Next[i] = i+1;
432  }
433  Next[NumRows-1] = Last;
434  IsNextDirty = 0;
435  InitIds();
436 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
void GetDatV(TVec< TDat > &DatV) const
Definition: hash.h:492
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
void InitIds()
Adds explicit row ids, initialize hash set mapping ids to physical rows.
Definition: table.cpp:1883
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void GetKeyV(TVec< TKey > &KeyV) const
Definition: hash.h:484
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: gbase.h:23
TVec< TInt > TIntV
Definition: ds.h:1594
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
int Len() const
Definition: hash.h:228
TTable::TTable ( const TTable Table)
inline

Copy constructor.

Definition at line 919 of file table.h.

919  : Context(Table.Context), Sch(Table.Sch),
921  LastValidRow(Table.LastValidRow), Next(Table.Next), IntCols(Table.IntCols),
922  FltCols(Table.FltCols), StrColMaps(Table.StrColMaps), ColTypeMap(Table.ColTypeMap),
925  SrcCol(Table.SrcCol), DstCol(Table.DstCol),
928  IsNextDirty(Table.IsNextDirty) {}
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
THash< GroupStmt, THash< TGroupKey, TIntV > > GroupMapping
Maps grouping statements to their (group-by key –> group id) mapping.
Definition: table.h:581
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
Schema Sch
Table Schema.
Definition: table.h:549
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TStrTrV CommonNodeAttrs
List of attribute pairs with values common to source and destination and their common given name...
Definition: table.h:594
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
THash< TStr, GroupStmt > GroupStmtNames
Maps user-given grouping statement names to their group-by attributes.
Definition: table.h:573
TStr SrcCol
Column (attribute) to serve as src nodes when constructing the graph.
Definition: table.h:589
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TStr DstCol
Column (attribute) to serve as dst nodes when constructing the graph.
Definition: table.h:590
THash< GroupStmt, THash< TInt, TGroupKey > > GroupIDMapping
Maps grouping statements to their (group id –> group-by key) mapping.
Definition: table.h:577
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TTable::TTable ( const TTable Table,
const TIntV RowIds 
)

Definition at line 438 of file table.cpp.

438  : Context(Table.Context),
439  Sch(Table.Sch), SrcCol(Table.SrcCol), DstCol(Table.DstCol), EdgeAttrV(Table.EdgeAttrV),
442  ColTypeMap = Table.ColTypeMap;
443  IntCols = TVec<TIntV>(Table.IntCols.Len());
444  FltCols = TVec<TFltV>(Table.FltCols.Len());
446  FirstValidRow = 0;
447  LastValidRow = -1;
448  NumRows = 0;
449  NumValidRows = 0;
450  AddSelectedRows(Table, RowIDs);
451  IsNextDirty = 0;
452  InitIds();
453 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
Schema Sch
Table Schema.
Definition: table.h:549
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void AddSelectedRows(const TTable &Table, const TIntV &RowIDs)
Adds rows from Table that correspond to ids in RowIDs.
Definition: table.cpp:4399
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
void InitIds()
Adds explicit row ids, initialize hash set mapping ids to physical rows.
Definition: table.cpp:1883
TStrTrV CommonNodeAttrs
List of attribute pairs with values common to source and destination and their common given name...
Definition: table.h:594
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
TStr SrcCol
Column (attribute) to serve as src nodes when constructing the graph.
Definition: table.h:589
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TStr DstCol
Column (attribute) to serve as dst nodes when constructing the graph.
Definition: table.h:590
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552

Member Function Documentation

void TTable::AddColType ( const TStr ColName,
TPair< TAttrType, TInt ColType 
)
inlineprotected

Adds column with name ColName and type ColType to the ColTypeMap.

Definition at line 651 of file table.h.

651  {
652  TStr NColName = NormalizeColName(ColName);
653  ColTypeMap.AddDat(NColName, ColType);
654  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::AddColType ( const TStr ColName,
TAttrType  ColType,
TInt  Index 
)
inlineprotected

Adds column with name ColName and type ColType to the ColTypeMap.

Definition at line 656 of file table.h.

656  {
657  TStr NColName = NormalizeColName(ColName);
658  AddColType(NColName, TPair<TAttrType,TInt>(ColType, Index));
659  }
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: ds.h:32
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
void TTable::AddDstNodeAttr ( const TStr Attr)
inline

Adds column to be used as dst node atribute of the graph.

Definition at line 1180 of file table.h.

1180 { AddGraphAttribute(Attr, false, false, true); }
void AddGraphAttribute(const TStr &Attr, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds names of columns to be used as graph attributes.
Definition: table.cpp:985
void TTable::AddDstNodeAttr ( TStrV Attrs)
inline

Adds columns to be used as dst node attributes of the graph.

Definition at line 1182 of file table.h.

1182 { AddGraphAttributeV(Attrs, false, false, true); }
void AddGraphAttributeV(TStrV &Attrs, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds vector of names of columns to be used as graph attributes.
Definition: table.cpp:992
void TTable::AddEdgeAttr ( const TStr Attr)
inline

Adds column to be used as graph edge attribute.

Definition at line 1172 of file table.h.

1172 { AddGraphAttribute(Attr, true, false, false); }
void AddGraphAttribute(const TStr &Attr, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds names of columns to be used as graph attributes.
Definition: table.cpp:985
void TTable::AddEdgeAttr ( TStrV Attrs)
inline

Adds columns to be used as graph edge attributes.

Definition at line 1174 of file table.h.

1174 { AddGraphAttributeV(Attrs, true, false, false); }
void AddGraphAttributeV(TStrV &Attrs, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds vector of names of columns to be used as graph attributes.
Definition: table.cpp:992
void TTable::AddEdgeAttributes ( PNEANet Graph,
int  RowId 
)
inlineprotected

Adds attributes of edge corresponding to RowId to the Graph.

Definition at line 3395 of file table.cpp.

3395  {
3396  for (TInt i = 0; i < EdgeAttrV.Len(); i++) {
3397  TStr ColName = EdgeAttrV[i];
3398  TAttrType T = GetColType(ColName);
3399  TInt Index = GetColIdx(ColName);
3400  switch (T) {
3401  case atInt:
3402  Graph->AddIntAttrDatE(RowId, IntCols[Index][RowId], ColName);
3403  break;
3404  case atFlt:
3405  Graph->AddFltAttrDatE(RowId, FltCols[Index][RowId], ColName);
3406  break;
3407  case atStr:
3408  Graph->AddStrAttrDatE(RowId, GetStrVal(Index, RowId), ColName);
3409  break;
3410  }
3411  }
3412 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
Definition: dt.h:412
Definition: gbase.h:23
Definition: gbase.h:23
void TTable::AddFltCol ( const TStr ColName)

Adds a float column with name ColName.

Definition at line 4680 of file table.cpp.

4680  {
4681  AddSchemaCol(ColName, atFlt);
4682  FltCols.Add(TFltV(NumRows));
4683  TInt L = FltCols.Len();
4684  AddColType(ColName, atFlt, L-1);
4685 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TVec< TFlt > TFltV
Definition: ds.h:1596
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: gbase.h:23
void TTable::AddGraphAttribute ( const TStr Attr,
TBool  IsEdge,
TBool  IsSrc,
TBool  IsDst 
)
protected

Adds names of columns to be used as graph attributes.

Definition at line 985 of file table.cpp.

985  {
986  if (!IsColName(Attr)) { TExcept::Throw(Attr + ": No such column"); }
987  if (IsEdge) { EdgeAttrV.Add(NormalizeColName(Attr)); }
988  if (IsSrc) { SrcNodeAttrV.Add(NormalizeColName(Attr)); }
989  if (IsDst) { DstNodeAttrV.Add(NormalizeColName(Attr)); }
990 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::AddGraphAttributeV ( TStrV Attrs,
TBool  IsEdge,
TBool  IsSrc,
TBool  IsDst 
)
protected

Adds vector of names of columns to be used as graph attributes.

Definition at line 992 of file table.cpp.

992  {
993  for (TInt i = 0; i < Attrs.Len(); i++) {
994  if (!IsColName(Attrs[i])) {
995  TExcept::Throw(Attrs[i] + ": no such column");
996  }
997  }
998  for (TInt i = 0; i < Attrs.Len(); i++) {
999  if (IsEdge) { EdgeAttrV.Add(NormalizeColName(Attrs[i])); }
1000  if (IsSrc) { SrcNodeAttrV.Add(NormalizeColName(Attrs[i])); }
1001  if (IsDst) { DstNodeAttrV.Add(NormalizeColName(Attrs[i])); }
1002  }
1003 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::AddIdColumn ( const TStr IdColName)
protected

Adds a column of explicit integer identifiers to the rows.

Definition at line 1900 of file table.cpp.

1900  {
1901  //printf("NumRows: %d\n", NumRows.Val);
1902  TInt IdCol = IntCols.Add();
1903  IntCols[IdCol].Reserve(NumRows, NumRows);
1904  //printf("IdCol Reserved\n");
1905  TInt IdCnt = 0;
1906  RowIdMap.Clr();
1907  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
1908  IntCols[IdCol][RI.GetRowIdx()] = IdCnt;
1909  RowIdMap.AddDat(IdCnt, RI.GetRowIdx());
1910  IdCnt++;
1911  }
1912  AddSchemaCol(ColName, atInt);
1913  AddColType(ColName, atInt, IntCols.Len()-1);
1914 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
void Clr(const bool &DoDel=true, const int &NoDelLim=-1, const bool &ResetDat=true)
Definition: hash.h:361
void Reserve(const TSizeTy &_MxVals)
Reserves enough memory for the vector to store _MxVals elements.
Definition: ds.h:543
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::AddIntCol ( const TStr ColName)

Adds an integer column with name ColName.

Definition at line 4673 of file table.cpp.

4673  {
4674  AddSchemaCol(ColName, atInt);
4676  TInt L = IntCols.Len();
4677  AddColType(ColName, atInt, L-1);
4678 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1134
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddJointRow ( const TTable T1,
const TTable T2,
TInt  RowIdx1,
TInt  RowIdx2 
)
protected

Adds joint row T1[RowIdx1]<=>T2[RowIdx2].

Definition at line 1957 of file table.cpp.

1957  {
1958  for (TInt i = 0; i < T1.IntCols.Len(); i++) {
1959  IntCols[i].Add(T1.IntCols[i][RowIdx1]);
1960  }
1961  for (TInt i = 0; i < T1.FltCols.Len(); i++) {
1962  FltCols[i].Add(T1.FltCols[i][RowIdx1]);
1963  }
1964  for (TInt i = 0; i < T1.StrColMaps.Len(); i++) {
1965  StrColMaps[i].Add(T1.StrColMaps[i][RowIdx1]);
1966  }
1967  TInt IntOffset = T1.IntCols.Len();
1968  TInt FltOffset = T1.FltCols.Len();
1969  TInt StrOffset = T1.StrColMaps.Len();
1970  for (TInt i = 0; i < T2.IntCols.Len(); i++) {
1971  IntCols[i+IntOffset].Add(T2.IntCols[i][RowIdx2]);
1972  }
1973  for (TInt i = 0; i < T2.FltCols.Len(); i++) {
1974  FltCols[i+FltOffset].Add(T2.FltCols[i][RowIdx2]);
1975  }
1976  for (TInt i = 0; i < T2.StrColMaps.Len(); i++) {
1977  StrColMaps[i+StrOffset].Add(T2.StrColMaps[i][RowIdx2]);
1978  }
1979  TInt IdOffset = IntOffset + T2.IntCols.Len();
1980  NumRows++;
1981  NumValidRows++;
1982  if (!Next.Empty()) {
1983  Next[Next.Len()-1] = NumValidRows-1;
1985  }
1986  Next.Add(Last);
1988  IntCols[IdOffset].Add(NumRows-1);
1989 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
bool Empty() const
Tests whether the vector is empty.
Definition: ds.h:570
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::AddNJointRowsMP ( const TTable T1,
const TTable T2,
const TVec< TIntPrV > &  JointRowIDSet 
)
protected

Adds rows from T1 and T2 to this table in a parallel manner. Used by Join.

Definition at line 4442 of file table.cpp.

4442  {
4443  //double startFn = omp_get_wtime();
4444  int JointTableSize = 0;
4445  TIntV StartOffsets(JointRowIDSet.Len());
4446  for (int i = 0; i < JointRowIDSet.Len(); i++) {
4447  StartOffsets[i] = JointTableSize;
4448  JointTableSize += JointRowIDSet[i].Len();
4449  }
4450  if (JointTableSize == 0) {
4451  TExcept::Throw("Joint table is empty");
4452  }
4453  //double endOffsets = omp_get_wtime();
4454  //printf("Offsets time = %f\n",endOffsets-startFn);
4455  ResizeTable(JointTableSize);
4456  //double endResize = omp_get_wtime();
4457  //printf("Resize time = %f\n",endResize-endOffsets);
4458  NumRows = JointTableSize;
4459  NumValidRows = JointTableSize;
4460  Assert(NumRows <= Next.Len());
4461 
4462  TInt IntOffset = T1.IntCols.Len();
4463  TInt FltOffset = T1.FltCols.Len();
4464  TInt StrOffset = T1.StrColMaps.Len();
4465 
4466  TInt IdOffset = IntOffset + T2.IntCols.Len();
4467  RowIdMap.Clr();
4468  for (TInt IdCnt = 0; IdCnt < JointTableSize; IdCnt++) {
4469  RowIdMap.AddDat(IdCnt, IdCnt);
4470  }
4471 
4472  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
4473  for (int j = 0; j < JointRowIDSet.Len(); j++) {
4474  const TIntPrV& RowIDs = JointRowIDSet[j];
4475  int start = StartOffsets[j];
4476  int NewRows = RowIDs.Len();
4477  if (NewRows == 0) {continue;}
4478  for (TInt r = 0; r < NewRows; r++){
4479  TIntPr CurrRowIdPr = RowIDs[r];
4480  for(TInt i = 0; i < T1.IntCols.Len(); i++){
4481  IntCols[i][start+r] = T1.IntCols[i][CurrRowIdPr.GetVal1()];
4482  }
4483  for(TInt i = 0; i < T1.FltCols.Len(); i++){
4484  FltCols[i][start+r] = T1.FltCols[i][CurrRowIdPr.GetVal1()];
4485  }
4486  for(TInt i = 0; i < T1.StrColMaps.Len(); i++){
4487  StrColMaps[i][start+r] = T1.StrColMaps[i][CurrRowIdPr.GetVal1()];
4488  }
4489  for(TInt i = 0; i < T2.IntCols.Len(); i++){
4490  IntCols[i+IntOffset][start+r] = T2.IntCols[i][CurrRowIdPr.GetVal2()];
4491  }
4492  for(TInt i = 0; i < T2.FltCols.Len(); i++){
4493  FltCols[i+FltOffset][start+r] = T2.FltCols[i][CurrRowIdPr.GetVal2()];
4494  }
4495  for(TInt i = 0; i < T2.StrColMaps.Len(); i++){
4496  StrColMaps[i+StrOffset][start+r] = T2.StrColMaps[i][CurrRowIdPr.GetVal2()];
4497  }
4498  IntCols[IdOffset][start+r] = start+r;
4499  }
4500  for(TInt r = 0; r < NewRows; r++){
4501  Next[start+r] = start+r+1;
4502  }
4503  }
4504  LastValidRow = JointTableSize-1;
4505  Next[LastValidRow] = Last;
4506  //double endIterate = omp_get_wtime();
4507  //printf("Iterate time = %f\n",endIterate-endResize);
4508 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
const TVal1 & GetVal1() const
Definition: ds.h:60
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
void ResizeTable(int RowCount)
Resizes the table to hold RowCount rows.
Definition: table.cpp:4330
const TVal2 & GetVal2() const
Definition: ds.h:61
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
#define Assert(Cond)
Definition: bd.h:251
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
Definition: ds.h:32
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void Clr(const bool &DoDel=true, const int &NoDelLim=-1, const bool &ResetDat=true)
Definition: hash.h:361
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::AddNodeAttr ( const TStr Attr)
inline

Handles the common case where src and dst both belong to the same "universe" of entities.

Definition at line 1184 of file table.h.

1184 { AddSrcNodeAttr(Attr); AddDstNodeAttr(Attr); }
void AddDstNodeAttr(const TStr &Attr)
Adds column to be used as dst node atribute of the graph.
Definition: table.h:1180
void AddSrcNodeAttr(const TStr &Attr)
Adds column to be used as src node atribute of the graph.
Definition: table.h:1176
void TTable::AddNodeAttr ( TStrV Attrs)
inline

Handles the common case where src and dst both belong to the same "universe" of entities.

Definition at line 1186 of file table.h.

1186 { AddSrcNodeAttr(Attrs); AddDstNodeAttr(Attrs); }
void AddDstNodeAttr(const TStr &Attr)
Adds column to be used as dst node atribute of the graph.
Definition: table.h:1180
void AddSrcNodeAttr(const TStr &Attr)
Adds column to be used as src node atribute of the graph.
Definition: table.h:1176
void TTable::AddNodeAttributes ( TInt  NId,
TStrV  NodeAttrV,
TInt  RowId,
THash< TInt, TStrIntVH > &  NodeIntAttrs,
THash< TInt, TStrFltVH > &  NodeFltAttrs,
THash< TInt, TStrStrVH > &  NodeStrAttrs 
)
inlineprotected

Takes as parameters, and updates, maps NodeXAttrs: Node Id –> (attribute name –> Vector of attribute values).

Definition at line 3414 of file table.cpp.

3415  {
3416  for (TInt i = 0; i < NodeAttrV.Len(); i++) {
3417  TStr ColAttr = NodeAttrV[i];
3418  TAttrType CT = GetColType(ColAttr);
3419  int ColId = GetColIdx(ColAttr);
3420  // check if this is a common src-dst attribute
3421  for (TInt i = 0; i < CommonNodeAttrs.Len(); i++) {
3422  if (CommonNodeAttrs[i].Val1 == ColAttr || CommonNodeAttrs[i].Val2 == ColAttr) {
3423  ColAttr = CommonNodeAttrs[i].Val3;
3424  break;
3425  }
3426  }
3427  if (CT == atInt) {
3428  if (!NodeIntAttrs.IsKey(NId)) { NodeIntAttrs.AddKey(NId); }
3429  if (!NodeIntAttrs.GetDat(NId).IsKey(ColAttr)) { NodeIntAttrs.GetDat(NId).AddKey(ColAttr); }
3430  NodeIntAttrs.GetDat(NId).GetDat(ColAttr).Add(IntCols[ColId][RowId]);
3431  } else if (CT == atFlt) {
3432  if (!NodeFltAttrs.IsKey(NId)) { NodeFltAttrs.AddKey(NId); }
3433  if (!NodeFltAttrs.GetDat(NId).IsKey(ColAttr)) { NodeFltAttrs.GetDat(NId).AddKey(ColAttr); }
3434  NodeFltAttrs.GetDat(NId).GetDat(ColAttr).Add(FltCols[ColId][RowId]);
3435  } else {
3436  if (!NodeStrAttrs.IsKey(NId)) { NodeStrAttrs.AddKey(NId); }
3437  if (!NodeStrAttrs.GetDat(NId).IsKey(ColAttr)) { NodeStrAttrs.GetDat(NId).AddKey(ColAttr); }
3438  NodeStrAttrs.GetDat(NId).GetDat(ColAttr).Add(GetStrVal(ColId, RowId));
3439  }
3440  }
3441 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TStrTrV CommonNodeAttrs
List of attribute pairs with values common to source and destination and their common given name...
Definition: table.h:594
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
int AddKey(const TKey &Key)
Definition: hash.h:373
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
Definition: dt.h:412
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddNRows ( int  NewRows,
const TVec< TIntV > &  IntColsP,
const TVec< TFltV > &  FltColsP,
const TVec< TIntV > &  StrColMapsP 
)
protected

Adds NewRows rows from the given vectors for each column type.

Definition at line 4421 of file table.cpp.

4421  {
4422  if (NewRows == 0) { return; }
4423  // this call should be thread-safe
4424  int start = GetEmptyRowsStart(NewRows);
4425  for (TInt r = 0; r < NewRows; r++) {
4426  for (TInt i = 0; i < IntColsP.Len(); i++) {
4427  IntCols[i][start+r] = IntColsP[i][r];
4428  }
4429  for (TInt i = 0; i < FltColsP.Len(); i++) {
4430  FltCols[i][start+r] = FltColsP[i][r];
4431  }
4432  for (TInt i = 0; i < StrColMapsP.Len(); i++) {
4433  StrColMaps[i][start+r] = StrColMapsP[i][r];
4434  }
4435  }
4436  for (TInt r = 0; r < NewRows-1; r++) {
4437  Next[start+r] = start+r+1;
4438  }
4439 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
int GetEmptyRowsStart(int NewRows)
Gets the start index to a chunk of empty rows of size NewRows.
Definition: table.cpp:4376
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
void TTable::AddRow ( const TRowIterator RI)
protected

Adds row corresponding to RI.

Definition at line 4295 of file table.cpp.

4295  {
4296  for (TInt c = 0; c < Sch.Len(); c++) {
4297  TStr ColName = GetSchemaColName(c);
4298  if (ColName == IdColName) { continue; }
4299 
4300  TInt ColIdx = GetColIdx(ColName);
4301 
4302  switch (GetColType(ColName)) {
4303  case atInt:
4304  IntCols[ColIdx].Add(RI.GetIntAttr(ColName));
4305  break;
4306  case atFlt:
4307  FltCols[ColIdx].Add(RI.GetFltAttr(ColName));
4308  break;
4309  case atStr:
4310  StrColMaps[ColIdx].Add(RI.GetStrMapByName(ColName));
4311  break;
4312  }
4313  }
4315 }
TFlt GetFltAttr(TInt ColIdx) const
Returns value of floating point attribute specified by float column index for current row...
Definition: table.cpp:159
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
Schema Sch
Table Schema.
Definition: table.h:549
TInt GetIntAttr(TInt ColIdx) const
Returns value of integer attribute specified by integer column index for current row.
Definition: table.cpp:155
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TInt GetStrMapByName(const TStr &Col) const
Returns integer mapping of string attribute specified by attribute name for current row...
Definition: table.cpp:181
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
void UpdateTableForNewRow()
Updates table state after adding one or more rows.
Definition: table.cpp:4140
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: dt.h:412
Definition: gbase.h:23
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddRow ( const TIntV IntVals,
const TFltV FltVals,
const TStrV StrVals 
)
protected

Adds row with values corresponding to the given vectors by type.

Definition at line 4317 of file table.cpp.

4317  {
4318  for (TInt c = 0; c < IntVals.Len(); c++) {
4319  IntCols[c].Add(IntVals[c]);
4320  }
4321  for (TInt c = 0; c < FltVals.Len(); c++) {
4322  FltCols[c].Add(FltVals[c]);
4323  }
4324  for (TInt c = 0; c < StrVals.Len(); c++) {
4325  AddStrVal(c, StrVals[c]);
4326  }
4328 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
void UpdateTableForNewRow()
Updates table state after adding one or more rows.
Definition: table.cpp:4140
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
void AddStrVal(const TInt &ColIdx, const TStr &Val)
Adds Val in column with id ColIdx.
Definition: table.cpp:971
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddRow ( const TTableRow Row)
inline

Adds row with values taken from given TTableRow.

Definition at line 1002 of file table.h.

1002 { AddRow(Row.GetIntVals(), Row.GetFltVals(), Row.GetStrVals()); };
TStrV GetStrVals() const
Gets string attributes of this row.
Definition: table.h:253
TFltV GetFltVals() const
Gets float attributes of this row.
Definition: table.h:251
TIntV GetIntVals() const
Gets int attributes of this row.
Definition: table.h:249
void AddRow(const TRowIterator &RI)
Adds row corresponding to RI.
Definition: table.cpp:4295
void TTable::AddSchemaCol ( const TStr ColName,
TAttrType  ColType 
)
inlineprotected

Adds column with name ColName and type ColType to the schema.

Definition at line 642 of file table.h.

642  {
643  TStr NColName = NormalizeColName(ColName);
644  Sch.Add(TPair<TStr,TAttrType>(NColName, ColType));
645  }
Schema Sch
Table Schema.
Definition: table.h:549
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddSelectedRows ( const TTable Table,
const TIntV RowIDs 
)
protected

Adds rows from Table that correspond to ids in RowIDs.

Definition at line 4399 of file table.cpp.

4399  {
4400  int NewRows = RowIDs.Len();
4401  if (NewRows == 0) { return; }
4402  // this call should be thread-safe
4403  int start = GetEmptyRowsStart(NewRows);
4404  for (TInt r = 0; r < NewRows; r++) {
4405  TInt CurrRowIdx = RowIDs[r];
4406  for (TInt i = 0; i < Table.IntCols.Len(); i++) {
4407  IntCols[i][start+r] = Table.IntCols[i][CurrRowIdx];
4408  }
4409  for (TInt i = 0; i < Table.FltCols.Len(); i++) {
4410  FltCols[i][start+r] = Table.FltCols[i][CurrRowIdx];
4411  }
4412  for (TInt i = 0; i < Table.StrColMaps.Len(); i++) {
4413  StrColMaps[i][start+r] = Table.StrColMaps[i][CurrRowIdx];
4414  }
4415  }
4416  for (TInt r = 0; r < NewRows-1; r++) {
4417  Next[start+r] = start+r+1;
4418  }
4419 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
int GetEmptyRowsStart(int NewRows)
Gets the start index to a chunk of empty rows of size NewRows.
Definition: table.cpp:4376
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
void TTable::AddSrcNodeAttr ( const TStr Attr)
inline

Adds column to be used as src node atribute of the graph.

Definition at line 1176 of file table.h.

1176 { AddGraphAttribute(Attr, false, true, false); }
void AddGraphAttribute(const TStr &Attr, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds names of columns to be used as graph attributes.
Definition: table.cpp:985
void TTable::AddSrcNodeAttr ( TStrV Attrs)
inline

Adds columns to be used as src node attributes of the graph.

Definition at line 1178 of file table.h.

1178 { AddGraphAttributeV(Attrs, false, true, false); }
void AddGraphAttributeV(TStrV &Attrs, TBool IsEdge, TBool IsSrc, TBool IsDst)
Adds vector of names of columns to be used as graph attributes.
Definition: table.cpp:992
void TTable::AddStrCol ( const TStr ColName)

Adds a string column with name ColName.

Definition at line 4687 of file table.cpp.

4687  {
4688  AddSchemaCol(ColName, atStr);
4690  TInt L = StrColMaps.Len();
4691  AddColType(ColName, atStr, L-1);
4692 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddStrVal ( const TInt ColIdx,
const TStr Val 
)
protected

Adds Val in column with id ColIdx.

Definition at line 971 of file table.cpp.

971  {
972  TInt KeyId = TInt(Context->StringVals.AddKey(Key));
973  //printf("TTable::AddStrVal2 %d .%s. %d\n", ColIdx.Val, Key.CStr(), KeyId.Val);
974  StrColMaps[ColIdx].Add(KeyId);
975 }
TTableContext * Context
Execution Context.
Definition: table.h:545
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
int AddKey(const char *Key)
Definition: hash.h:968
Definition: dt.h:1134
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::AddStrVal ( const TStr Col,
const TStr Val 
)
protected

Adds Val in column with name Col.

Definition at line 977 of file table.cpp.

977  {
978  if (GetColType(Col) != atStr) {
979  TExcept::Throw(Col + " is not a string valued column");
980  }
981  //printf("TTable::AddStrVal1 .%s. .%s.\n", Col.CStr(), Key.CStr());
982  AddStrVal(GetColIdx(Col), Key);
983 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void AddStrVal(const TInt &ColIdx, const TStr &Val)
Adds Val in column with id ColIdx.
Definition: table.cpp:971
Definition: gbase.h:23
void TTable::AddTable ( const TTable T)
protected

Adds all the rows of the input table. Allows duplicate rows (not a union).

Definition at line 3975 of file table.cpp.

3975  {
3976  //for (TInt c = 0; c < S.Len(); c++) {
3977  // if (S[c] != T.S[c]) { printf("(%s,%d) != (%s,%d)\n", S[c].Val1.CStr(), S[c].Val2, T.S[c].Val1.CStr(), T.S[c].Val2); TExcept::Throw("when adding tables, their schemas must match!"); }
3978  //}
3979  for (TInt c = 0; c < Sch.Len(); c++) {
3980  TStr ColName = GetSchemaColName(c);
3981  TInt ColIdx = GetColIdx(ColName);
3982  TInt TColIdx = ColName == IdColName ? T.GetColIdx(T.IdColName) : T.GetColIdx(ColName);
3983  if (TColIdx < 0) { TExcept::Throw("when adding a table, it must contain all columns of source table!"); }
3984  switch (GetColType(ColName)) {
3985  case atInt:
3986  IntCols[ColIdx].AddV(T.IntCols[TColIdx]);
3987  break;
3988  case atFlt:
3989  FltCols[ColIdx].AddV(T.FltCols[TColIdx]);
3990  break;
3991  case atStr:
3992  StrColMaps[ColIdx].AddV(T.StrColMaps[TColIdx]);
3993  break;
3994  }
3995  }
3996 
3997  TIntV TNext(T.Next);
3998  for (TInt i = 0; i < TNext.Len(); i++) {
3999  if (TNext[i] != Last && TNext[i] != Invalid) { TNext[i] += NumRows; }
4000  }
4001 
4002  Next.AddV(TNext);
4003  // checks if table is empty
4004  if (LastValidRow >= 0) {
4006  }
4008  NumRows += T.NumRows;
4010 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
Definition: dt.h:412
Definition: gbase.h:23
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
TSizeTy AddV(const TVec< TVal, TSizeTy > &ValV)
Adds the elements of the vector ValV to the to end of the vector.
Definition: ds.h:1110
void TTable::Aggregate ( const TStrV GroupByAttrs,
TAttrAggr  AggOp,
const TStr ValAttr,
const TStr ResAttr,
TBool  Ordered = true 
)

Aggregates values of ValAttr after grouping with respect to GroupByAttrs. Result are stored as new attribute ResAttr.

Definition at line 1585 of file table.cpp.

1586  {
1587 
1588  for (TInt c = 0; c < GroupByAttrs.Len(); c++) {
1589  if (!IsColName(GroupByAttrs[c])) {
1590  TExcept::Throw("no such column " + GroupByAttrs[c]);
1591  }
1592  }
1593 
1594  // double startFn = omp_get_wtime();
1595  TStrV NGroupByAttrs = NormalizeColNameV(GroupByAttrs);
1596  TBool UsePhysicalIds = (GetColIdx(IdColName) < 0);
1597 
1598  THash<TInt,TIntV> GroupByIntMapping;
1599  THash<TFlt,TIntV> GroupByFltMapping;
1600  THash<TInt,TIntV> GroupByStrMapping;
1601  THash<TGroupKey,TIntV> Mapping;
1602 #ifdef GCC_ATOMIC
1603  THashMP<TInt,TIntV> GroupByIntMapping_MP(NumValidRows);
1604  TIntV GroupByIntMPKeys(NumValidRows);
1605 #endif
1606  TInt NumOfGroups = 0;
1607  TInt GroupingCase = 0;
1608 
1609  // check if grouping already exists
1610  GroupStmt Stmt(NGroupByAttrs, Ordered, UsePhysicalIds);
1611  if (GroupMapping.IsKey(Stmt)) {
1612  Mapping = GroupMapping.GetDat(Stmt);
1613  } else{
1614  if(NGroupByAttrs.Len() == 1){
1615  switch(GetColType(NGroupByAttrs[0])){
1616  case atInt:
1617 #ifdef GCC_ATOMIC
1618  if(GetMP()){
1619  GroupByIntColMP(NGroupByAttrs[0], GroupByIntMapping_MP, UsePhysicalIds);
1620  int x = 0;
1621  for(THashMP<TInt,TIntV>::TIter it = GroupByIntMapping_MP.BegI(); it < GroupByIntMapping_MP.EndI(); it++){
1622  GroupByIntMPKeys[x] = it.GetKey();
1623  x++;
1624  /*
1625  printf("%d --> ", it.GetKey().Val);
1626  TIntV& V = it.GetDat();
1627  for(int i = 0; i < V.Len(); i++){
1628  printf(" %d", V[i].Val);
1629  }
1630  printf("\n");
1631  */
1632  }
1633  NumOfGroups = x;
1634  GroupingCase = 4;
1635  //printf("Number of groups: %d\n", NumOfGroups.Val);
1636  break;
1637  }
1638 #endif // GCC_ATOMIC
1639  GroupByIntCol(NGroupByAttrs[0], GroupByIntMapping, TIntV(), true, UsePhysicalIds);
1640  NumOfGroups = GroupByIntMapping.Len();
1641  GroupingCase = 1;
1642  break;
1643  case atFlt:
1644  GroupByFltCol(NGroupByAttrs[0], GroupByFltMapping, TIntV(), true, UsePhysicalIds);
1645  NumOfGroups = GroupByFltMapping.Len();
1646  GroupingCase = 2;
1647  break;
1648  case atStr:
1649  GroupByStrCol(NGroupByAttrs[0], GroupByStrMapping, TIntV(), true, UsePhysicalIds);
1650  NumOfGroups = GroupByStrMapping.Len();
1651  GroupingCase = 3;
1652  break;
1653  }
1654  }
1655  else{
1656  TIntV UniqueVector;
1658  GroupAux(NGroupByAttrs, Mapping_aux, Ordered, "", false, UniqueVector, UsePhysicalIds);
1659  for(THash<TGroupKey, TPair<TInt, TIntV> >::TIter it = Mapping_aux.BegI(); it < Mapping_aux.EndI(); it++){
1660  Mapping.AddDat(it.GetKey(), it.GetDat().Val2);
1661  }
1662  NumOfGroups = Mapping.Len();
1663  }
1664  }
1665 
1666  // double endGroup = omp_get_wtime();
1667  // printf("Group time = %f\n", endGroup-startFn);
1668 
1669  TAttrType T = GetColType(ValAttr);
1670 
1671  // add column corresponding to result attribute type
1672  if (AggOp == aaCount) { AddIntCol(ResAttr); }
1673  else {
1674  if (T == atInt) { AddIntCol(ResAttr); }
1675  else if (T == atFlt) { AddFltCol(ResAttr); }
1676  else {
1677  // Count is the only aggregation operation handled for Str
1678  TExcept::Throw("Invalid aggregation for Str type!");
1679  }
1680  }
1681  TInt ColIdx = GetColIdx(ResAttr);
1682  TInt AggrColIdx = GetColIdx(ValAttr);
1683 
1684  // double endAdd = omp_get_wtime();
1685  // printf("AddCol time = %f\n", endAdd-endGroup);
1686 
1687 #ifdef USE_OPENMP
1688  #pragma omp parallel for schedule(dynamic)
1689 #endif
1690  for (int g = 0; g < NumOfGroups; g++) {
1691  TIntV* GroupRows = NULL;
1692  switch(GroupingCase){
1693  case 0:
1694  GroupRows = & Mapping.GetDat(Mapping.GetKey(g));
1695  break;
1696  case 1:
1697  GroupRows = & GroupByIntMapping.GetDat(GroupByIntMapping.GetKey(g));
1698  break;
1699  case 2:
1700  GroupRows = & GroupByIntMapping.GetDat(GroupByIntMapping.GetKey(g));
1701  break;
1702  case 3:
1703  GroupRows = & GroupByStrMapping.GetDat(GroupByStrMapping.GetKey(g));
1704  break;
1705  case 4:
1706 #ifdef GCC_ATOMIC
1707  GroupRows = & GroupByIntMapping_MP.GetDat(GroupByIntMPKeys[g]);
1708 #endif
1709  break;
1710  }
1711 
1712  // find valid rows of group
1713  /*
1714  TIntV ValidRows;
1715  for (TInt i = 0; i < GroupRows.Len(); i++) {
1716  // TODO: This should not be necessary
1717  if (!RowIdMap.IsKey(GroupRows[i])) { continue; }
1718  TInt RowId = RowIdMap.GetDat(GroupRows[i]);
1719  // GroupRows has physical row indices
1720  if (RowId != Invalid) { ValidRows.Add(RowId); }
1721  }
1722  */
1723  TIntV& ValidRows = *GroupRows;
1724  TInt sz = ValidRows.Len();
1725  if (sz <= 0) continue;
1726  // Count is handled separately (other operations have aggregation policies defined in a template)
1727  if (AggOp == aaCount) {
1728  for (TInt i = 0; i < sz; i++) { IntCols[ColIdx][ValidRows[i]] = sz; }
1729  } else {
1730  // aggregate based on column type
1731  if (T == atInt) {
1732  TIntV V;
1733  for (TInt i = 0; i < sz; i++) { V.Add(IntCols[AggrColIdx][ValidRows[i]]); }
1734  TInt Res = AggregateVector<TInt>(V, AggOp);
1735  if (AggOp == aaMean) { Res = Res / sz; }
1736  for (TInt i = 0; i < sz; i++) { IntCols[ColIdx][ValidRows[i]] = Res; }
1737  } else {
1738  TFltV V;
1739  for (TInt i = 0; i < sz; i++) { V.Add(FltCols[AggrColIdx][ValidRows[i]]); }
1740  TFlt Res = AggregateVector<TFlt>(V, AggOp);
1741  if (AggOp == aaMean) { Res /= sz; }
1742  for (TInt i = 0; i < sz; i++) { FltCols[ColIdx][ValidRows[i]] = Res; }
1743  }
1744  }
1745  }
1746  // double endIter = omp_get_wtime();
1747  // printf("Iter time = %f\n", endIter-endAdd);
1748 }
THash< GroupStmt, THash< TGroupKey, TIntV > > GroupMapping
Maps grouping statements to their (group-by key –> group id) mapping.
Definition: table.h:581
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void AddIntCol(const TStr &ColName)
Adds an integer column with name ColName.
Definition: table.cpp:4673
Definition: table.h:257
void GroupByIntColMP(const TStr &GroupBy, THashMP< TInt, TIntV > &Grouping, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values, using OpenMP multi-threading.
Definition: table.cpp:1225
TIter BegI() const
Definition: hash.h:213
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
static TInt GetMP()
Definition: table.h:527
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TIter EndI() const
Definition: hash.h:218
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
void GroupByFltCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with float values. Returns hash table with grouping.
Definition: table.h:1626
Definition: gbase.h:23
Definition: dt.h:1383
TPHKeyDat * EndI
Definition: hashmp.h:47
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
const TVal & GetDat(const TVal &Val) const
Returns reference to the first occurrence of element Val.
Definition: ds.h:838
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void GroupByIntCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values.
Definition: table.h:1598
A class representing a cached grouping statement identifier.
Definition: table.h:266
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
void GroupByStrCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with string values. Returns hash table with grouping.
Definition: table.h:1653
Definition: gbase.h:23
Hash-Table with multiprocessing support.
Definition: hashmp.h:81
TVec< TInt > TIntV
Definition: ds.h:1594
void AddFltCol(const TStr &ColName)
Adds a float column with name ColName.
Definition: table.cpp:4680
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: dt.h:971
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
int Len() const
Definition: hash.h:228
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
const TKey & GetKey(const int &KeyId) const
Definition: hash.h:252
Definition: table.h:257
void TTable::AggregateCols ( const TStrV AggrAttrs,
TAttrAggr  AggOp,
const TStr ResAttr 
)

Aggregates attributes in AggrAttrs across columns.

Definition at line 1750 of file table.cpp.

1750  {
1752  for (TInt i = 0; i < AggrAttrs.Len(); i++) {
1753  Info.Add(GetColTypeMap(AggrAttrs[i]));
1754  if (Info[i].Val1 != Info[0].Val1) {
1755  TExcept::Throw("AggregateCols: Aggregation attributes must have the same type");
1756  }
1757  }
1758 
1759  if (Info[0].Val1 == atInt) {
1760  AddIntCol(ResAttr);
1761  TInt ResIdx = GetColIdx(ResAttr);
1762 
1763  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
1764  TInt RowIdx = RI.GetRowIdx();
1765  TIntV V;
1766  for (TInt i = 0; i < AggrAttrs.Len(); i++) {
1767  V.Add(IntCols[Info[i].Val2][RowIdx]);
1768  }
1769  IntCols[ResIdx][RowIdx] = AggregateVector<TInt>(V, AggOp);
1770  }
1771  } else if (Info[0].Val1 == atFlt) {
1772  AddFltCol(ResAttr);
1773  TInt ResIdx = GetColIdx(ResAttr);
1774 
1775  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
1776  TInt RowIdx = RI.GetRowIdx();
1777  TFltV V;
1778  for (TInt i = 0; i < AggrAttrs.Len(); i++) {
1779  V.Add(FltCols[Info[i].Val2][RowIdx]);
1780  }
1781  FltCols[ResIdx][RowIdx] = AggregateVector<TFlt>(V, AggOp);
1782  }
1783  } else {
1784  TExcept::Throw("AggregateCols: Only Int and Flt aggregation supported right now");
1785  }
1786 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
void AddIntCol(const TStr &ColName)
Adds an integer column with name ColName.
Definition: table.cpp:4673
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
void AddFltCol(const TStr &ColName)
Adds a float column with name ColName.
Definition: table.cpp:4680
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
template<class T >
T TTable::AggregateVector ( TVec< T > &  V,
TAttrAggr  Policy 
)
protected

Aggregates vector into a single scalar value according to a policy.

Aggregate vector into a single scalar value according to a policy. Used for choosing an attribute value for a node when this node appears in several records and has conflicting attribute values

Definition at line 1544 of file table.h.

1544  {
1545  switch (Policy) {
1546  case aaMin: {
1547  T Res = V[0];
1548  for (TInt i = 1; i < V.Len(); i++) {
1549  if (V[i] < Res) { Res = V[i]; }
1550  }
1551  return Res;
1552  }
1553  case aaMax: {
1554  T Res = V[0];
1555  for (TInt i = 1; i < V.Len(); i++) {
1556  if (V[i] > Res) { Res = V[i]; }
1557  }
1558  return Res;
1559  }
1560  case aaFirst: {
1561  return V[0];
1562  }
1563  case aaLast:{
1564  return V[V.Len()-1];
1565  }
1566  case aaSum: {
1567  T Res = V[0];
1568  for (TInt i = 1; i < V.Len(); i++) {
1569  Res = Res + V[i];
1570  }
1571  return Res;
1572  }
1573  case aaMean: {
1574  T Res = V[0];
1575  for (TInt i = 1; i < V.Len(); i++) {
1576  Res = Res + V[i];
1577  }
1578  //Res = Res / V.Len(); // TODO: Handle Str case separately?
1579  return Res;
1580  }
1581  case aaMedian: {
1582  V.Sort();
1583  return V[V.Len()/2];
1584  }
1585  case aaCount: {
1586  // NOTE: Code should never reach here
1587  // I had to put this here to avoid a compiler warning.
1588  // Is there a better way to do this?
1589  return V[0];
1590  }
1591  }
1592  // Added to remove a compiler warning.
1593  T ShouldNotComeHere;
1594  return ShouldNotComeHere;
1595 }
Definition: table.h:257
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: table.h:257
void Sort(const bool &Asc=true)
Sorts the elements of the vector.
Definition: ds.h:1318
Definition: dt.h:1134
Definition: table.h:257
Definition: table.h:257
Definition: table.h:257
Definition: table.h:257
Definition: table.h:257
TRowIterator TTable::BegRI ( ) const
inline

Gets iterator to the first valid row of the table.

Definition at line 1241 of file table.h.

1241 { return TRowIterator(FirstValidRow, this);}
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
friend class TRowIterator
Definition: table.h:1526
TRowIteratorWithRemove TTable::BegRIWR ( )
inline

Gets iterator with reomve to the first valid row.

Definition at line 1245 of file table.h.

1245 { return TRowIteratorWithRemove(FirstValidRow, this);}
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
friend class TRowIteratorWithRemove
Definition: table.h:1527
PNEANet TTable::BuildGraph ( const TIntV RowIds,
TAttrAggr  AggrPolicy 
)
protected

Makes a single pass over the rows in the given row id set, and creates nodes, edges, assigns node and edge attributes.

Definition at line 3445 of file table.cpp.

3445  {
3446  PNEANet Graph = TNEANet::New();
3447 
3448  const TAttrType NodeType = GetColType(SrcCol);
3449  Assert(NodeType == GetColType(DstCol));
3450  const TInt SrcColIdx = GetColIdx(SrcCol);
3451  const TInt DstColIdx = GetColIdx(DstCol);
3452 
3453  // node values - i.e. the unique values of src/dst col
3454  //THashSet<TInt> IntNodeVals; // for both int and string node attr types.
3455  THash<TFlt, TInt> FltNodeVals;
3456 
3457  // node attributes
3458  THash<TInt, TStrIntVH> NodeIntAttrs;
3459  THash<TInt, TStrFltVH> NodeFltAttrs;
3460  THash<TInt, TStrStrVH> NodeStrAttrs;
3461 
3462  // make single pass over all rows in given row id set
3463  for (TVec<TInt>::TIter it = RowIds.BegI(); it < RowIds.EndI(); it++) {
3464  TInt CurrRowIdx = *it;
3465 
3466  // add src and dst nodes to graph if they are not seen earlier
3467  TInt SVal, DVal;
3468  if (NodeType == atFlt) {
3469  TFlt FSVal = FltCols[SrcColIdx][CurrRowIdx];
3470  SVal = CheckAndAddFltNode(Graph, FltNodeVals, FSVal);
3471  TFlt FDVal = FltCols[SrcColIdx][CurrRowIdx];
3472  DVal = CheckAndAddFltNode(Graph, FltNodeVals, FDVal);
3473  } else if (NodeType == atInt || NodeType == atStr) {
3474  if (NodeType == atInt) {
3475  SVal = IntCols[SrcColIdx][CurrRowIdx];
3476  DVal = IntCols[DstColIdx][CurrRowIdx];
3477  } else {
3478  SVal = StrColMaps[SrcColIdx][CurrRowIdx];
3479  if (strlen(Context->StringVals.GetKey(SVal)) == 0) { continue; } //illegal value
3480  DVal = StrColMaps[DstColIdx][CurrRowIdx];
3481  if (strlen(Context->StringVals.GetKey(DVal)) == 0) { continue; } //illegal value
3482  }
3483  if (!Graph->IsNode(SVal)) { Graph->AddNode(SVal); }
3484  if (!Graph->IsNode(DVal)) { Graph->AddNode(DVal); }
3485  //CheckAndAddIntNode(Graph, IntNodeVals, SVal);
3486  //CheckAndAddIntNode(Graph, IntNodeVals, DVal);
3487  }
3488 
3489  // add edge and edge attributes
3490  Graph->AddEdge(SVal, DVal, CurrRowIdx);
3491  if (EdgeAttrV.Len() > 0) { AddEdgeAttributes(Graph, CurrRowIdx); }
3492 
3493  // get src and dst node attributes into hashmaps
3494  if (SrcNodeAttrV.Len() > 0) {
3495  AddNodeAttributes(SVal, SrcNodeAttrV, CurrRowIdx, NodeIntAttrs, NodeFltAttrs, NodeStrAttrs);
3496  }
3497  if (DstNodeAttrV.Len() > 0) {
3498  AddNodeAttributes(DVal, DstNodeAttrV, CurrRowIdx, NodeIntAttrs, NodeFltAttrs, NodeStrAttrs);
3499  }
3500  }
3501 
3502  // aggregate node attributes and add to graph
3503  if (SrcNodeAttrV.Len() > 0 || DstNodeAttrV.Len() > 0) {
3504  for (TNEANet::TNodeI NodeI = Graph->BegNI(); NodeI < Graph->EndNI(); NodeI++) {
3505  TInt NId = NodeI.GetId();
3506  if (NodeIntAttrs.IsKey(NId)) {
3507  TStrIntVH IntAttrVals = NodeIntAttrs.GetDat(NId);
3508  for (TStrIntVH::TIter it = IntAttrVals.BegI(); it < IntAttrVals.EndI(); it++) {
3509  TInt AttrVal = AggregateVector<TInt>(it.GetDat(), AggrPolicy);
3510  Graph->AddIntAttrDatN(NId, AttrVal, it.GetKey());
3511  }
3512  }
3513  if (NodeFltAttrs.IsKey(NId)) {
3514  TStrFltVH FltAttrVals = NodeFltAttrs.GetDat(NId);
3515  for (TStrFltVH::TIter it = FltAttrVals.BegI(); it < FltAttrVals.EndI(); it++) {
3516  TFlt AttrVal = AggregateVector<TFlt>(it.GetDat(), AggrPolicy);
3517  Graph->AddFltAttrDatN(NId, AttrVal, it.GetKey());
3518  }
3519  }
3520  if (NodeStrAttrs.IsKey(NId)) {
3521  TStrStrVH StrAttrVals = NodeStrAttrs.GetDat(NId);
3522  for (TStrStrVH::TIter it = StrAttrVals.BegI(); it < StrAttrVals.EndI(); it++) {
3523  TStr AttrVal = AggregateVector<TStr>(it.GetDat(), AggrPolicy);
3524  Graph->AddStrAttrDatN(NId, AttrVal, it.GetKey());
3525  }
3526  }
3527  }
3528  }
3529 
3530  return Graph;
3531 }
TIter EndI() const
Returns an iterator referring to the past-the-end element in the vector.
Definition: ds.h:595
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TIter BegI() const
Definition: hash.h:213
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
Node iterator. Only forward iteration (operator++) is supported.
Definition: network.h:1792
TIter EndI() const
Definition: hash.h:218
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
const char * GetKey(const int &KeyId) const
Definition: hash.h:893
#define Assert(Cond)
Definition: bd.h:251
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
Definition: dt.h:1134
TStr SrcCol
Column (attribute) to serve as src nodes when constructing the graph.
Definition: table.h:589
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
TStr DstCol
Column (attribute) to serve as dst nodes when constructing the graph.
Definition: table.h:590
Definition: dt.h:412
TIter BegI() const
Returns an iterator pointing to the first element in the vector.
Definition: ds.h:593
Definition: hash.h:97
Definition: gbase.h:23
Definition: bd.h:196
void AddEdgeAttributes(PNEANet &Graph, int RowId)
Adds attributes of edge corresponding to RowId to the Graph.
Definition: table.cpp:3395
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
static PNEANet New()
Static cons returns pointer to graph. Ex: PNEANet Graph=TNEANet::New().
Definition: network.h:2176
TInt CheckAndAddFltNode(T Graph, THash< TFlt, TInt > &NodeVals, TFlt FNodeVal)
Checks if given NodeVal is seen earlier; if not, add it to Graph and hashmap NodeVals.
Definition: table.h:1533
void AddNodeAttributes(TInt NId, TStrV NodeAttrV, TInt RowId, THash< TInt, TStrIntVH > &NodeIntAttrs, THash< TInt, TStrFltVH > &NodeFltAttrs, THash< TInt, TStrStrVH > &NodeStrAttrs)
Takes as parameters, and updates, maps NodeXAttrs: Node Id –> (attribute name –> Vector of attribut...
Definition: table.cpp:3414
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
TTableContext * TTable::ChangeContext ( TTableContext Context)

Changes the current context. Moves all object items to the new context.

Definition at line 921 of file table.cpp.

921  {
922  TInt L = Sch.Len();
923 
924 #if 0
925  // print table on the input, iterate over all columns
926  for (TInt i = 0; i < L; i++) {
927  // skip non-string columns
928  if (GetSchemaColType(i) != atStr) {
929  continue;
930  }
931 
932  TInt ColIdx = GetColIdx(GetSchemaColName(i));
933 
934  // iterate over all rows
935  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
936  TInt RowIdx = RowI.GetRowIdx();
937  TInt KeyId = StrColMaps[ColIdx][RowIdx];
938  printf("ChangeContext in %d %d %d .%s.\n",
939  ColIdx.Val, RowIdx.Val, KeyId.Val, GetStrVal(ColIdx, RowIdx).CStr());
940  }
941  }
942 #endif
943 
944  // add strings to the new context, change values
945  // iterate over all columns
946  for (TInt i = 0; i < L; i++) {
947  // skip non-string columns
948  if (GetSchemaColType(i) != atStr) {
949  continue;
950  }
951 
952  TInt ColIdx = GetColIdx(GetSchemaColName(i));
953 
954  // iterate over all rows
955  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
956  TInt RowIdx = RowI.GetRowIdx();
957  // get the string
958  TStr Key = GetStrVal(ColIdx, RowIdx);
959  // add the string to the new context
960  TInt KeyId = TInt(NewContext->StringVals.AddKey(Key));
961  // change the value in the table
962  StrColMaps[ColIdx][RowIdx] = KeyId;
963  }
964  }
965 
966  // set the new context
967  Context = NewContext;
968  return Context;
969 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
Schema Sch
Table Schema.
Definition: table.h:549
int Val
Definition: dt.h:1136
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
TAttrType GetSchemaColType(TInt Idx) const
Gets type of the column with index Idx in the schema.
Definition: table.h:640
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
Definition: dt.h:412
Definition: gbase.h:23
template<class T >
TInt TTable::CheckAndAddFltNode ( Graph,
THash< TFlt, TInt > &  NodeVals,
TFlt  FNodeVal 
)
protected

Checks if given NodeVal is seen earlier; if not, add it to Graph and hashmap NodeVals.

Definition at line 1533 of file table.h.

1533  {
1534  if (!NodeVals.IsKey(FNodeVal)) {
1535  TInt NodeVal = NodeVals.Len();
1536  Graph->AddNode(NodeVal);
1537  NodeVals.AddKey(FNodeVal);
1538  NodeVals.AddDat(FNodeVal, NodeVal);
1539  return NodeVal;
1540  } else { return NodeVals.GetDat(FNodeVal); }
1541 }
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
Definition: dt.h:1134
int AddKey(const TKey &Key)
Definition: hash.h:373
bool IsKey(const TKey &Key) const
Definition: hash.h:258
int Len() const
Definition: hash.h:228
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::CheckAndAddIntNode ( PNEANet  Graph,
THashSet< TInt > &  NodeVals,
TInt  NodeId 
)
inlineprotected

Checks if given NodeId is seen earlier; if not, add it to Graph and hashmap NodeVals.

Definition at line 3388 of file table.cpp.

3388  {
3389  if (!NodeVals.IsKey(NodeId)) {
3390  Graph->AddNode(NodeId);
3391  NodeVals.AddKey(NodeId);
3392  }
3393 }
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
int AddKey(const TKey &Key)
Definition: shash.h:1254
TInt TTable::CheckSortedKeyVal ( TIntV Key,
TIntV Val,
TInt  Start,
TInt  End 
)
staticprotected

Definition at line 5310 of file table.cpp.

5310  {
5311  TInt j;
5312  for (j = Start; j < End; j++) {
5313  if (CompareKeyVal(Key[j], Val[j], Key[j+1], Val[j+1]) > 0) {
5314  break;
5315  }
5316  }
5317  if (j >= End) { return 0; }
5318  else { return 1; }
5319 }
static TInt CompareKeyVal(const TInt &K1, const TInt &V1, const TInt &K2, const TInt &V2)
Definition: table.cpp:5297
Definition: dt.h:1134
void TTable::Classify ( TPredicate Predicate,
const TStr LabelName,
const TInt PositiveLabel = 1,
const TInt NegativeLabel = 0 
)

Definition at line 2805 of file table.cpp.

2805  {
2806  TIntV SelectedRows;
2807  Select(Predicate, SelectedRows, false);
2808  ClassifyAux(SelectedRows, LabelName, PositiveLabel, NegativeLabel);
2809 }
void Select(TPredicate &Predicate, TIntV &SelectedRows, TBool Remove=true)
Selects rows that satisfy given Predicate.
Definition: table.cpp:2750
void ClassifyAux(const TIntV &SelectedRows, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
Adds a label attribute with positive labels on selected rows and negative labels on the rest...
Definition: table.cpp:4694
void TTable::ClassifyAtomic ( const TStr Col1,
const TStr Col2,
TPredComp  Cmp,
const TStr LabelName,
const TInt PositiveLabel = 1,
const TInt NegativeLabel = 0 
)

Definition at line 2866 of file table.cpp.

2867  {
2868  TIntV SelectedRows;
2869  SelectAtomic(Col1, Col2, Cmp, SelectedRows, false);
2870  ClassifyAux(SelectedRows, LabelName, PositiveLabel, NegativeLabel);
2871 }
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void SelectAtomic(const TStr &Col1, const TStr &Col2, TPredComp Cmp, TIntV &SelectedRows, TBool Remove=true)
Selects rows using atomic compare operation.
Definition: table.cpp:2813
void ClassifyAux(const TIntV &SelectedRows, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
Adds a label attribute with positive labels on selected rows and negative labels on the rest...
Definition: table.cpp:4694
template<class T >
void TTable::ClassifyAtomicConst ( const TStr Col,
const T &  Val,
TPredComp  Cmp,
const TStr LabelName,
const TInt PositiveLabel = 1,
const TInt NegativeLabel = 0 
)
inline

Definition at line 1301 of file table.h.

1302  {
1303  TIntV SelectedRows;
1304  PTable SelectedTable;
1305  SelectAtomicConst(Col, TPrimitive(Val), Cmp, SelectedRows, SelectedTable, false, false);
1306  ClassifyAux(SelectedRows, LabelName, PositiveLabel, NegativeLabel);
1307  }
Primitive class: Wrapper around primitive data types.
Definition: table.h:211
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
Definition: bd.h:196
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void ClassifyAux(const TIntV &SelectedRows, const TStr &LabelName, const TInt &PositiveLabel=1, const TInt &NegativeLabel=0)
Adds a label attribute with positive labels on selected rows and negative labels on the rest...
Definition: table.cpp:4694
void TTable::ClassifyAux ( const TIntV SelectedRows,
const TStr LabelName,
const TInt PositiveLabel = 1,
const TInt NegativeLabel = 0 
)
protected

Adds a label attribute with positive labels on selected rows and negative labels on the rest.

Definition at line 4694 of file table.cpp.

4694  {
4695  AddSchemaCol(LabelName, atInt);
4696  TInt LabelColIdx = IntCols.Len();
4697  AddColType(LabelName, atInt, LabelColIdx);
4699  for (TInt i = 0; i < NumRows; i++) {
4700  IntCols[LabelColIdx][i] = NegativeLabel;
4701  }
4702  for (TInt i = 0; i < SelectedRows.Len(); i++) {
4703  IntCols[LabelColIdx][SelectedRows[i]] = PositiveLabel;
4704  }
4705 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1134
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::ColAdd ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs columnwise addition. See TTable::ColGenericOp.

Definition at line 4816 of file table.cpp.

4816  {
4817  ColGenericOp(Attr1, Attr2, ResultAttrName, aoAdd);
4818 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColAdd ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Performs columnwise addition with column of given table.

Definition at line 4949 of file table.cpp.

4950  {
4951  ColGenericOp(Attr1, Table, Attr2, ResultAttrName, aoAdd, AddToFirstTable);
4952 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColAdd ( const TStr Attr1,
const TFlt Num,
const TStr ResultAttrName = "",
const TBool  floatCast = false 
)

Performs addition of column values and given Num.

Definition at line 5063 of file table.cpp.

5063  {
5064  ColGenericOp(Attr1, Num, ResultAttrName, aoAdd, floatCast);
5065 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColConcat ( const TStr Attr1,
const TStr Attr2,
const TStr Sep = "",
const TStr ResAttr = "" 
)

Concatenates two string columns.

Definition at line 5083 of file table.cpp.

5083  {
5084  // check if attributes are valid
5085  if (!IsAttr(Attr1)) TExcept::Throw("No attribute present: " + Attr1);
5086  if (!IsAttr(Attr2)) TExcept::Throw("No attribute present: " + Attr2);
5087 
5088  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
5089  TPair<TAttrType, TInt> Info2 = GetColTypeMap(Attr2);
5090 
5091  if (Info1.Val1 != atStr || Info2.Val1 != atStr) {
5092  TExcept::Throw("Only string columns supported in concat.");
5093  }
5094 
5095  // source column indices
5096  TInt ColIdx1 = Info1.Val2;
5097  TInt ColIdx2 = Info2.Val2;
5098 
5099  // destination column index
5100  TInt ColIdx3 = ColIdx1;
5101 
5102  // Create empty result column with type that of first attribute
5103  if (ResAttr != "") {
5104  AddStrCol(ResAttr);
5105  ColIdx3 = GetColIdx(ResAttr);
5106  }
5107 
5108  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5109  TStr CurVal1 = RowI.GetStrAttr(ColIdx1);
5110  TStr CurVal2 = RowI.GetStrAttr(ColIdx2);
5111  TStr NewVal = CurVal1 + Sep + CurVal2;
5112  TInt Key = TInt(Context->StringVals.AddKey(NewVal));
5113  StrColMaps[ColIdx3][RowI.GetRowIdx()] = Key;
5114  }
5115 }
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
int AddKey(const char *Key)
Definition: hash.h:968
Definition: dt.h:1134
void AddStrCol(const TStr &ColName)
Adds a string column with name ColName.
Definition: table.cpp:4687
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: gbase.h:23
void TTable::ColConcat ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr Sep = "",
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Concatenates string column with column of given table.

Definition at line 5117 of file table.cpp.

5118  {
5119  // check if attributes are valid
5120  if (!IsAttr(Attr1)) { TExcept::Throw("No attribute present: " + Attr1); }
5121  if (!Table.IsAttr(Attr2)) { TExcept::Throw("No attribute present: " + Attr2); }
5122 
5123  if (NumValidRows != Table.NumValidRows) {
5124  TExcept::Throw("Tables do not have equal number of rows");
5125  }
5126 
5127  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
5128  TPair<TAttrType, TInt> Info2 = Table.GetColTypeMap(Attr2);
5129 
5130  if (Info1.Val1 != atStr || Info2.Val1 != atStr) {
5131  TExcept::Throw("Only string columns supported in concat.");
5132  }
5133 
5134  // source column indices
5135  TInt ColIdx1 = Info1.Val2;
5136  TInt ColIdx2 = Info2.Val2;
5137 
5138  // destination column index
5139  TInt ColIdx3 = ColIdx1;
5140 
5141  if (!AddToFirstTable) {
5142  ColIdx3 = ColIdx2;
5143  }
5144 
5145  // Create empty result column in appropriate table with type that of first attribute
5146  if (ResAttr != "") {
5147  if (AddToFirstTable) {
5148  AddStrCol(ResAttr);
5149  ColIdx3 = GetColIdx(ResAttr);
5150  }
5151  else {
5152  Table.AddStrCol(ResAttr);
5153  ColIdx3 = Table.GetColIdx(ResAttr);
5154  }
5155  }
5156 
5157  TRowIterator RI1, RI2;
5158 
5159  RI1 = BegRI();
5160  RI2 = Table.BegRI();
5161 
5162  while (RI1 < EndRI() && RI2 < Table.EndRI()) {
5163  TStr CurVal1 = RI1.GetStrAttr(ColIdx1);
5164  TStr CurVal2 = RI2.GetStrAttr(ColIdx2);
5165  TStr NewVal = CurVal1 + Sep + CurVal2;
5166  TInt Key = TInt(Context->StringVals.AddKey(NewVal));
5167  if (AddToFirstTable) {
5168  StrColMaps[ColIdx3][RI1.GetRowIdx()] = Key;
5169  }
5170  else {
5171  Table.StrColMaps[ColIdx3][RI2.GetRowIdx()] = Key;
5172  }
5173  RI1++;
5174  RI2++;
5175  }
5176 
5177  if (RI1 != EndRI() || RI2 != Table.EndRI()) {
5178  TExcept::Throw("ColGenericOp: Iteration error");
5179  }
5180 }
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TStr GetStrAttr(TInt ColIdx) const
Returns value of string attribute specified by string column index for current row.
Definition: table.cpp:163
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TInt GetRowIdx() const
Gets the id of the row pointed by this iterator.
Definition: table.cpp:151
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
int AddKey(const char *Key)
Definition: hash.h:968
Definition: dt.h:1134
void AddStrCol(const TStr &ColName)
Adds a string column with name ColName.
Definition: table.cpp:4687
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
void TTable::ColConcatConst ( const TStr Attr1,
const TStr Val,
const TStr Sep = "",
const TStr ResAttr = "" 
)

Concatenates column values with given string value.

Definition at line 5182 of file table.cpp.

5182  {
5183  // check if attribute is valid
5184  if (!IsAttr(Attr1)) { TExcept::Throw("No attribute present: " + Attr1); }
5185 
5186  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
5187 
5188  if (Info1.Val1 != atStr) {
5189  TExcept::Throw("Only string columns supported in concat.");
5190  }
5191 
5192  // source column index
5193  TInt ColIdx1 = Info1.Val2;
5194 
5195  // destination column index
5196  TInt ColIdx2 = ColIdx1;
5197 
5198  // Create empty result column with type that of first attribute
5199  if (ResAttr != "") {
5200  AddStrCol(ResAttr);
5201  ColIdx2 = GetColIdx(ResAttr);
5202  }
5203 
5204  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5205  TStr CurVal = RowI.GetStrAttr(ColIdx1);
5206  TStr NewVal = CurVal + Sep + Val;
5207  TInt Key = TInt(Context->StringVals.AddKey(NewVal));
5208  StrColMaps[ColIdx2][RowI.GetRowIdx()] = Key;
5209  }
5210 }
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
int AddKey(const char *Key)
Definition: hash.h:968
Definition: dt.h:1134
void AddStrCol(const TStr &ColName)
Adds a string column with name ColName.
Definition: table.cpp:4687
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: gbase.h:23
void TTable::ColDiv ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs columnwise division. See TTable::ColGenericOp.

Definition at line 4828 of file table.cpp.

4828  {
4829  ColGenericOp(Attr1, Attr2, ResultAttrName, aoDiv);
4830 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColDiv ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Performs columnwise division with column of given table.

Definition at line 4964 of file table.cpp.

4965  {
4966  ColGenericOp(Attr1, Table, Attr2, ResultAttrName, aoDiv, AddToFirstTable);
4967 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColDiv ( const TStr Attr1,
const TFlt Num,
const TStr ResultAttrName = "",
const TBool  floatCast = false 
)

Performs division of column values and given Num.

Definition at line 5075 of file table.cpp.

5075  {
5076  ColGenericOp(Attr1, Num, ResultAttrName, aoDiv, floatCast);
5077 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColGenericOp ( const TStr Attr1,
const TStr Attr2,
const TStr ResAttr,
TArithOp  op 
)

Performs columnwise arithmetic operation.

Performs Attr1 OP Attr2 and stores it in Attr1 If ResAttr != "", result is stored in a new column ResAttr

Definition at line 4752 of file table.cpp.

4752  {
4753  // check if attributes are valid
4754  if (!IsAttr(Attr1)) TExcept::Throw("No attribute present: " + Attr1);
4755  if (!IsAttr(Attr2)) TExcept::Throw("No attribute present: " + Attr2);
4756  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
4757  TPair<TAttrType, TInt> Info2 = GetColTypeMap(Attr2);
4758  TAttrType Arg1Type = Info1.Val1;
4759  TAttrType Arg2Type = Info2.Val1;
4760  if (Arg1Type == atStr || Arg2Type == atStr) {
4761  TExcept::Throw("Only numeric columns supported in arithmetic operations.");
4762  }
4763  if(Arg1Type == atInt && Arg2Type == atFlt && ResAttr == ""){
4764  TExcept::Throw("Trying to write float values to an existing int-typed column");
4765  }
4766  // source column indices
4767  TInt ColIdx1 = Info1.Val2;
4768  TInt ColIdx2 = Info2.Val2;
4769 
4770  // destination column index
4771  TInt ColIdx3 = ColIdx1;
4772  // Create empty result column with type that of first attribute
4773  if (ResAttr != "") {
4774  if (Arg1Type == atInt && Arg2Type == atInt) {
4775  AddIntCol(ResAttr);
4776  }
4777  else {
4778  AddFltCol(ResAttr);
4779  }
4780  ColIdx3 = GetColIdx(ResAttr);
4781  }
4782 #ifdef USE_OPENMP
4783  if(GetMP()){
4784  ColGenericOpMP(ColIdx1, ColIdx2, Arg1Type, Arg2Type, ColIdx3, op);
4785  return;
4786  }
4787 #endif //USE_OPENMP
4788  TAttrType ResType = atFlt;
4789  if(Arg1Type == atInt && Arg2Type == atInt){ printf("hooray!\n"); ResType = atInt;}
4790  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
4791  //printf("%d %d %d %d\n", ColIdx1.Val, ColIdx2.Val, ColIdx3.Val, RowI.GetRowIdx().Val);
4792  if(ResType == atInt){
4793  TInt V1 = RowI.GetIntAttr(ColIdx1);
4794  TInt V2 = RowI.GetIntAttr(ColIdx2);
4795  if (op == aoAdd) { IntCols[ColIdx3][RowI.GetRowIdx()] = V1 + V2; }
4796  if (op == aoSub) { IntCols[ColIdx3][RowI.GetRowIdx()] = V1 - V2; }
4797  if (op == aoMul) { IntCols[ColIdx3][RowI.GetRowIdx()] = V1 * V2; }
4798  if (op == aoDiv) { IntCols[ColIdx3][RowI.GetRowIdx()] = V1 / V2; }
4799  if (op == aoMod) { IntCols[ColIdx3][RowI.GetRowIdx()] = V1 % V2; }
4800  if (op == aoMin) { IntCols[ColIdx3][RowI.GetRowIdx()] = (V1 < V2) ? V1 : V2;}
4801  if (op == aoMax) { IntCols[ColIdx3][RowI.GetRowIdx()] = (V1 > V2) ? V1 : V2;}
4802  } else{
4803  TFlt V1 = (Arg1Type == atInt) ? (TFlt)RowI.GetIntAttr(ColIdx1) : RowI.GetFltAttr(ColIdx1);
4804  TFlt V2 = (Arg2Type == atInt) ? (TFlt)RowI.GetIntAttr(ColIdx2) : RowI.GetFltAttr(ColIdx2);
4805  if (op == aoAdd) { FltCols[ColIdx3][RowI.GetRowIdx()] = V1 + V2; }
4806  if (op == aoSub) { FltCols[ColIdx3][RowI.GetRowIdx()] = V1 - V2; }
4807  if (op == aoMul) { FltCols[ColIdx3][RowI.GetRowIdx()] = V1 * V2; }
4808  if (op == aoDiv) { FltCols[ColIdx3][RowI.GetRowIdx()] = V1 / V2; }
4809  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
4810  if (op == aoMin) { FltCols[ColIdx3][RowI.GetRowIdx()] = (V1 < V2) ? V1 : V2;}
4811  if (op == aoMax) { FltCols[ColIdx3][RowI.GetRowIdx()] = (V1 > V2) ? V1 : V2;}
4812  }
4813  }
4814 }
Definition: table.h:259
Definition: table.h:259
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
Definition: table.h:259
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void AddIntCol(const TStr &ColName)
Adds an integer column with name ColName.
Definition: table.cpp:4673
Definition: table.h:259
static TInt GetMP()
Definition: table.h:527
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
Definition: table.h:259
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
void AddFltCol(const TStr &ColName)
Adds a float column with name ColName.
Definition: table.cpp:4680
Definition: gbase.h:23
Definition: table.h:259
Definition: table.h:259
void ColGenericOpMP(TInt ArgColIdx1, TInt ArgColIdx2, TAttrType ArgType1, TAttrType ArgType2, TInt ResColIdx, TArithOp op)
Definition: table.cpp:4708
void TTable::ColGenericOp ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr,
TArithOp  op,
TBool  AddToFirstTable 
)

Performs columnwise arithmetic operation with column of given table.

Definition at line 4844 of file table.cpp.

4845  {
4846  // check if attributes are valid
4847  if (!IsAttr(Attr1)) { TExcept::Throw("No attribute present: " + Attr1); }
4848  if (!Table.IsAttr(Attr2)) { TExcept::Throw("No attribute present: " + Attr2); }
4849 
4850  if (NumValidRows != Table.NumValidRows) {
4851  TExcept::Throw("Tables do not have equal number of rows");
4852  }
4853 
4854  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
4855  TPair<TAttrType, TInt> Info2 = Table.GetColTypeMap(Attr2);
4856  TAttrType Arg1Type = Info1.Val1;
4857  TAttrType Arg2Type = Info2.Val1;
4858  if (Info1.Val1 == atStr || Info2.Val1 == atStr) {
4859  TExcept::Throw("Only numeric columns supported in arithmetic operations.");
4860  }
4861  if(Arg1Type == atInt && Arg2Type == atFlt && ResAttr == ""){
4862  TExcept::Throw("Trying to write float values to an existing int-typed column");
4863  }
4864  // source column indices
4865  TInt ColIdx1 = Info1.Val2;
4866  TInt ColIdx2 = Info2.Val2;
4867 
4868  // destination column index
4869  TInt ColIdx3 = AddToFirstTable ? ColIdx1 : ColIdx2;
4870 
4871  // Create empty result column in appropriate table with type that of first attribute
4872  if (ResAttr != "") {
4873  if (AddToFirstTable) {
4874  if (Arg1Type == atInt && Arg2Type == atInt) {
4875  AddIntCol(ResAttr);
4876  } else {
4877  AddFltCol(ResAttr);
4878  }
4879  ColIdx3 = GetColIdx(ResAttr);
4880  }
4881  else {
4882  if (Arg1Type == atInt && Arg2Type == atInt) {
4883  Table.AddIntCol(ResAttr);
4884  } else {
4885  Table.AddFltCol(ResAttr);
4886  }
4887  ColIdx3 = Table.GetColIdx(ResAttr);
4888  }
4889  }
4890 
4891  /*
4892  #ifdef USE_OPENMP
4893  if(GetMP()){
4894  ColGenericOpMP(Table, AddToFirstTable, ColIdx1, ColIdx2, Arg1Type, Arg2Type, ColIdx3, op);
4895  return;
4896  }
4897  #endif //USE_OPENMP
4898  */
4899 
4900  TRowIterator RI1, RI2;
4901  RI1 = BegRI();
4902  RI2 = Table.BegRI();
4903  TAttrType ResType = atFlt;
4904  if(Arg1Type == atInt && Arg2Type == atInt){ ResType = atInt;}
4905  while (RI1 < EndRI() && RI2 < Table.EndRI()) {
4906  if (ResType == atInt) {
4907  TInt V1 = RI1.GetIntAttr(ColIdx1);
4908  TInt V2 = RI2.GetIntAttr(ColIdx2);
4909  if (AddToFirstTable) {
4910  if (op == aoAdd) { IntCols[ColIdx3][RI1.GetRowIdx()] = V1 + V2; }
4911  if (op == aoSub) { IntCols[ColIdx3][RI1.GetRowIdx()] = V1 - V2; }
4912  if (op == aoMul) { IntCols[ColIdx3][RI1.GetRowIdx()] = V1 * V2; }
4913  if (op == aoDiv) { IntCols[ColIdx3][RI1.GetRowIdx()] = V1 / V2; }
4914  if (op == aoMod) { IntCols[ColIdx3][RI1.GetRowIdx()] = V1 % V2; }
4915  }
4916  else {
4917  if (op == aoAdd) { Table.IntCols[ColIdx3][RI2.GetRowIdx()] = V1 + V2; }
4918  if (op == aoSub) { Table.IntCols[ColIdx3][RI2.GetRowIdx()] = V1 - V2; }
4919  if (op == aoMul) { Table.IntCols[ColIdx3][RI2.GetRowIdx()] = V1 * V2; }
4920  if (op == aoDiv) { Table.IntCols[ColIdx3][RI2.GetRowIdx()] = V1 / V2; }
4921  if (op == aoMod) { Table.IntCols[ColIdx3][RI2.GetRowIdx()] = V1 % V2; }
4922  }
4923  } else {
4924  TFlt V1 = (Arg1Type == atInt) ? (TFlt)RI1.GetIntAttr(ColIdx1) : RI2.GetFltAttr(ColIdx1);
4925  TFlt V2 = (Arg2Type == atInt) ? (TFlt)RI1.GetIntAttr(ColIdx2) : RI2.GetFltAttr(ColIdx2);
4926  if (AddToFirstTable) {
4927  if (op == aoAdd) { FltCols[ColIdx3][RI1.GetRowIdx()] = V1 + V2; }
4928  if (op == aoSub) { FltCols[ColIdx3][RI1.GetRowIdx()] = V1 - V2; }
4929  if (op == aoMul) { FltCols[ColIdx3][RI1.GetRowIdx()] = V1 * V2; }
4930  if (op == aoDiv) { FltCols[ColIdx3][RI1.GetRowIdx()] = V1 / V2; }
4931  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
4932  } else {
4933  if (op == aoAdd) { Table.FltCols[ColIdx3][RI2.GetRowIdx()] = V1 + V2; }
4934  if (op == aoSub) { Table.FltCols[ColIdx3][RI2.GetRowIdx()] = V1 - V2; }
4935  if (op == aoMul) { Table.FltCols[ColIdx3][RI2.GetRowIdx()] = V1 * V2; }
4936  if (op == aoDiv) { Table.FltCols[ColIdx3][RI2.GetRowIdx()] = V1 / V2; }
4937  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
4938  }
4939  }
4940  RI1++;
4941  RI2++;
4942  }
4943 
4944  if (RI1 != EndRI() || RI2 != Table.EndRI()) {
4945  TExcept::Throw("ColGenericOp: Iteration error");
4946  }
4947 }
Definition: table.h:259
TFlt GetFltAttr(TInt ColIdx) const
Returns value of floating point attribute specified by float column index for current row...
Definition: table.cpp:159
Definition: table.h:259
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
Definition: table.h:259
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TInt GetIntAttr(TInt ColIdx) const
Returns value of integer attribute specified by integer column index for current row.
Definition: table.cpp:155
void AddIntCol(const TStr &ColName)
Adds an integer column with name ColName.
Definition: table.cpp:4673
Definition: table.h:259
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TInt GetRowIdx() const
Gets the id of the row pointed by this iterator.
Definition: table.cpp:151
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
void AddFltCol(const TStr &ColName)
Adds a float column with name ColName.
Definition: table.cpp:4680
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
Definition: table.h:259
void TTable::ColGenericOp ( const TStr Attr1,
const TFlt Num,
const TStr ResAttr,
TArithOp  op,
const TBool  floatCast 
)

Performs arithmetic op of column values and given Num.

Definition at line 4975 of file table.cpp.

4975  {
4976  // check if attribute is valid
4977  if (!IsAttr(Attr1)) { TExcept::Throw("No attribute present: " + Attr1); }
4978 
4979  TPair<TAttrType, TInt> Info1 = GetColTypeMap(Attr1);
4980  TAttrType ArgType = Info1.Val1;
4981  if (ArgType == atStr) {
4982  TExcept::Throw("Only numeric columns supported in arithmetic operations.");
4983  }
4984  // source column index
4985  TInt ColIdx1 = Info1.Val2;
4986  // destination column index
4987  TInt ColIdx2 = ColIdx1;
4988 
4989  // Create empty result column with type that of first attribute
4990  TBool shouldCast = floatCast;
4991  if (ResAttr != "") {
4992  if ((ArgType == atInt) & !shouldCast) {
4993  AddIntCol(ResAttr);
4994  } else {
4995  AddFltCol(ResAttr);
4996  }
4997  ColIdx2 = GetColIdx(ResAttr);
4998  } else {
4999  // Cannot change type of existing attribute
5000  shouldCast = false;
5001  }
5002 
5003  #ifdef USE_OPENMP
5004  if(GetMP()){
5005  ColGenericOpMP(ColIdx1, ColIdx2, ArgType, Num, op, shouldCast);
5006  return;
5007  }
5008  #endif //USE_OPENMP
5009 
5010  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5011  if ((ArgType == atInt) && !shouldCast) {
5012  TInt CurVal = RowI.GetIntAttr(ColIdx1);
5013  TInt Val = static_cast<int>(Num);
5014  if (op == aoAdd) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal + Val; }
5015  if (op == aoSub) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal - Val; }
5016  if (op == aoMul) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal * Val; }
5017  if (op == aoDiv) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal / Val; }
5018  if (op == aoMod) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal % Val; }
5019  }
5020  else {
5021  TFlt CurVal = (ArgType == atFlt) ? RowI.GetFltAttr(ColIdx1) : (TFlt) RowI.GetIntAttr(ColIdx1);
5022  if (op == aoAdd) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal + Num; }
5023  if (op == aoSub) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal - Num; }
5024  if (op == aoMul) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal * Num; }
5025  if (op == aoDiv) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal / Num; }
5026  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
5027  }
5028  }
5029 }
Definition: table.h:259
Definition: table.h:259
TBool IsAttr(const TStr &Attr)
Checks if Attr is an attribute of this table schema.
Definition: table.cpp:4628
Definition: table.h:259
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void AddIntCol(const TStr &ColName)
Adds an integer column with name ColName.
Definition: table.cpp:4673
Definition: table.h:259
static TInt GetMP()
Definition: table.h:527
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
void AddFltCol(const TStr &ColName)
Adds a float column with name ColName.
Definition: table.cpp:4680
Definition: gbase.h:23
Definition: dt.h:971
Definition: table.h:259
void ColGenericOpMP(TInt ArgColIdx1, TInt ArgColIdx2, TAttrType ArgType1, TAttrType ArgType2, TInt ResColIdx, TArithOp op)
Definition: table.cpp:4708
void TTable::ColGenericOpMP ( TInt  ArgColIdx1,
TInt  ArgColIdx2,
TAttrType  ArgType1,
TAttrType  ArgType2,
TInt  ResColIdx,
TArithOp  op 
)

Definition at line 4708 of file table.cpp.

4708  {
4709  TAttrType ResType = atFlt;
4710  if(ArgType1 == atInt && ArgType2 == atInt){ ResType = atInt;}
4711  TIntPrV Partitions;
4712  GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
4713  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
4714  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
4715  for (int i = 0; i < Partitions.Len(); i++){
4716  TRowIterator RowI(Partitions[i].GetVal1(), this);
4717  TRowIterator EndI(Partitions[i].GetVal2(), this);
4718  while(RowI < EndI){
4719  if(ResType == atInt){
4720  TInt V1 = RowI.GetIntAttr(ArgColIdx1);
4721  TInt V2 = RowI.GetIntAttr(ArgColIdx2);
4722  if (op == aoAdd) { IntCols[ResColIdx][RowI.GetRowIdx()] = V1 + V2; }
4723  if (op == aoSub) { IntCols[ResColIdx][RowI.GetRowIdx()] = V1 - V2; }
4724  if (op == aoMul) { IntCols[ResColIdx][RowI.GetRowIdx()] = V1 * V2; }
4725  if (op == aoDiv) { IntCols[ResColIdx][RowI.GetRowIdx()] = V1 / V2; }
4726  if (op == aoMod) { IntCols[ResColIdx][RowI.GetRowIdx()] = V1 % V2; }
4727  if (op == aoMin) { IntCols[ResColIdx][RowI.GetRowIdx()] = (V1 < V2) ? V1 : V2;}
4728  if (op == aoMax) { IntCols[ResColIdx][RowI.GetRowIdx()] = (V1 > V2) ? V1 : V2;}
4729  } else{
4730  TFlt V1 = (ArgType1 == atInt) ? (TFlt)RowI.GetIntAttr(ArgColIdx1) : RowI.GetFltAttr(ArgColIdx1);
4731  TFlt V2 = (ArgType2 == atInt) ? (TFlt)RowI.GetIntAttr(ArgColIdx2) : RowI.GetFltAttr(ArgColIdx2);
4732  if (op == aoAdd) { FltCols[ResColIdx][RowI.GetRowIdx()] = V1 + V2; }
4733  if (op == aoSub) { FltCols[ResColIdx][RowI.GetRowIdx()] = V1 - V2; }
4734  if (op == aoMul) { FltCols[ResColIdx][RowI.GetRowIdx()] = V1 * V2; }
4735  if (op == aoDiv) { FltCols[ResColIdx][RowI.GetRowIdx()] = V1 / V2; }
4736  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
4737  if (op == aoMin) { FltCols[ResColIdx][RowI.GetRowIdx()] = (V1 < V2) ? V1 : V2;}
4738  if (op == aoMax) { FltCols[ResColIdx][RowI.GetRowIdx()] = (V1 > V2) ? V1 : V2;}
4739  }
4740  RowI++;
4741  }
4742  }
4743 }
Definition: table.h:259
Definition: table.h:259
Definition: table.h:259
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: table.h:259
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: gbase.h:23
Definition: table.h:259
Definition: table.h:259
Definition: table.h:259
void TTable::ColGenericOpMP ( const TInt ColIdx1,
const TInt ColIdx2,
TAttrType  ArgType,
const TFlt Num,
TArithOp  op,
TBool  ShouldCast 
)

Definition at line 5032 of file table.cpp.

5032  {
5033  TIntPrV Partitions;
5034  GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
5035  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
5036  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
5037  for (int i = 0; i < Partitions.Len(); i++){
5038  TRowIterator RowI(Partitions[i].GetVal1(), this);
5039  TRowIterator EndI(Partitions[i].GetVal2(), this);
5040  while(RowI < EndI){
5041  if ((ArgType == atInt) && !ShouldCast) {
5042  TInt CurVal = RowI.GetIntAttr(ColIdx1);
5043  TInt Val = static_cast<int>(Num);
5044  if (op == aoAdd) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal + Val; }
5045  if (op == aoSub) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal - Val; }
5046  if (op == aoMul) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal * Val; }
5047  if (op == aoDiv) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal / Val; }
5048  if (op == aoMod) { IntCols[ColIdx2][RowI.GetRowIdx()] = CurVal % Val; }
5049  } else {
5050  TFlt CurVal = (ArgType == atFlt) ? RowI.GetFltAttr(ColIdx1) : (TFlt) RowI.GetIntAttr(ColIdx1);
5051  if (op == aoAdd) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal + Num; }
5052  if (op == aoSub) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal - Num; }
5053  if (op == aoMul) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal * Num; }
5054  if (op == aoDiv) { FltCols[ColIdx2][RowI.GetRowIdx()] = CurVal / Num; }
5055  if (op == aoMod) { TExcept::Throw("Cannot find modulo for float columns"); }
5056  }
5057  RowI++;
5058  }
5059  }
5060 }
Definition: table.h:259
Definition: table.h:259
Definition: table.h:259
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: table.h:259
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: gbase.h:23
Definition: table.h:259
void TTable::ColMax ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs max of two columns. See TTable::ColGenericOp.

Definition at line 4840 of file table.cpp.

4840  {
4841  ColGenericOp(Attr1, Attr2, ResultAttrName, aoMax);
4842 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMin ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs min of two columns. See TTable::ColGenericOp.

Definition at line 4836 of file table.cpp.

4836  {
4837  ColGenericOp(Attr1, Attr2, ResultAttrName, aoMin);
4838 }
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
Definition: table.h:259
void TTable::ColMod ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs columnwise modulus. See TTable::ColGenericOp.

Definition at line 4832 of file table.cpp.

4832  {
4833  ColGenericOp(Attr1, Attr2, ResultAttrName, aoMod);
4834 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMod ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Performs columnwise modulus with column of given table.

Definition at line 4969 of file table.cpp.

4970  {
4971  ColGenericOp(Attr1, Table, Attr2, ResultAttrName, aoMod, AddToFirstTable);
4972 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMod ( const TStr Attr1,
const TFlt Num,
const TStr ResultAttrName = "",
const TBool  floatCast = false 
)

Performs modulus of column values and given Num.

Definition at line 5079 of file table.cpp.

5079  {
5080  ColGenericOp(Attr1, Num, ResultAttrName, aoMod, floatCast);
5081 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMul ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs columnwise multiplication. See TTable::ColGenericOp.

Definition at line 4824 of file table.cpp.

4824  {
4825  ColGenericOp(Attr1, Attr2, ResultAttrName, aoMul);
4826 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMul ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Performs columnwise multiplication with column of given table.

Definition at line 4959 of file table.cpp.

4960  {
4961  ColGenericOp(Attr1, Table, Attr2, ResultAttrName, aoMul, AddToFirstTable);
4962 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColMul ( const TStr Attr1,
const TFlt Num,
const TStr ResultAttrName = "",
const TBool  floatCast = false 
)

Performs multiplication of column values and given Num.

Definition at line 5071 of file table.cpp.

5071  {
5072  ColGenericOp(Attr1, Num, ResultAttrName, aoMul, floatCast);
5073 }
Definition: table.h:259
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
void TTable::ColSub ( const TStr Attr1,
const TStr Attr2,
const TStr ResultAttrName = "" 
)

Performs columnwise subtraction. See TTable::ColGenericOp.

Definition at line 4820 of file table.cpp.

4820  {
4821  ColGenericOp(Attr1, Attr2, ResultAttrName, aoSub);
4822 }
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
Definition: table.h:259
void TTable::ColSub ( const TStr Attr1,
TTable Table,
const TStr Attr2,
const TStr ResAttr = "",
TBool  AddToFirstTable = true 
)

Performs columnwise subtraction with column of given table.

Definition at line 4954 of file table.cpp.

4955  {
4956  ColGenericOp(Attr1, Table, Attr2, ResultAttrName, aoSub, AddToFirstTable);
4957 }
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
Definition: table.h:259
void TTable::ColSub ( const TStr Attr1,
const TFlt Num,
const TStr ResultAttrName = "",
const TBool  floatCast = false 
)

Performs subtraction of column values and given Num.

Definition at line 5067 of file table.cpp.

5067  {
5068  ColGenericOp(Attr1, Num, ResultAttrName, aoSub, floatCast);
5069 }
void ColGenericOp(const TStr &Attr1, const TStr &Attr2, const TStr &ResAttr, TArithOp op)
Performs columnwise arithmetic operation.
Definition: table.cpp:4752
Definition: table.h:259
TInt TTable::CompareKeyVal ( const TInt K1,
const TInt V1,
const TInt K2,
const TInt V2 
)
staticprotected

Definition at line 5297 of file table.cpp.

5297  {
5298  // if (K1 == K2) {
5299  // if (V1 < V2) { return -1; }
5300  // else if (V1 > V2) { return 1; }
5301  // else return 0;
5302  // }
5303  // if (K1 < K2) { return -1; }
5304  // else { return 1; }
5305 
5306  if (K1 == K2) { return V1 - V2; }
5307  else { return K1 - K2; }
5308 }
TInt TTable::CompareRows ( TInt  R1,
TInt  R2,
const TAttrType CompareByType,
const TInt CompareByIndex,
TBool  Asc = true 
)
inlineprotected

Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strcmp semantics).

Definition at line 3064 of file table.cpp.

3064  {
3065  //printf("comparing rows %d %d by %s\n", R1.Val, R2.Val, CompareBy.CStr());
3066  switch (CompareByType) {
3067  case atInt:{
3068  if (IntCols[CompareByIndex][R1] > IntCols[CompareByIndex][R2]) { return (Asc ? 1 : -1); }
3069  if (IntCols[CompareByIndex][R1] < IntCols[CompareByIndex][R2]) { return (Asc ? -1 : 1); }
3070  return 0;
3071  }
3072  case atFlt:{
3073  if (FltCols[CompareByIndex][R1] > FltCols[CompareByIndex][R2]) { return (Asc ? 1 : -1); }
3074  if (FltCols[CompareByIndex][R1] < FltCols[CompareByIndex][R2]) { return (Asc ? -1 : 1); }
3075  return 0;
3076  }
3077  case atStr:{
3078  TStr S1 = GetStrVal(CompareByIndex, R1);
3079  TStr S2 = GetStrVal(CompareByIndex, R2);
3080  int CmpRes = strcmp(S1.CStr(), S2.CStr());
3081  return (Asc ? CmpRes : -CmpRes);
3082  }
3083  }
3084  // code should not come here, added to remove a compiler warning
3085  return 0;
3086 }
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
Definition: dt.h:412
Definition: gbase.h:23
Definition: gbase.h:23
char * CStr()
Definition: dt.h:476
TInt TTable::CompareRows ( TInt  R1,
TInt  R2,
const TVec< TAttrType > &  CompareByTypes,
const TIntV CompareByIndices,
TBool  Asc = true 
)
inlineprotected

Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strcmp semantics).

Definition at line 3088 of file table.cpp.

3088  {
3089  for (TInt i = 0; i < CompareByTypes.Len(); i++) {
3090  TInt res = CompareRows(R1, R2, CompareByTypes[i], CompareByIndices[i], Asc);
3091  if (res != 0) { return res; }
3092  }
3093  return 0;
3094 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: dt.h:1134
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
void TTable::ConcatTable ( const PTable T)
inlineprotected

Appends all rows of T to this table, and recalculate indices.

Definition at line 683 of file table.h.

683 {AddTable(*T); Reindex(); }
void Reindex()
Reinitializes row ids.
Definition: table.cpp:1889
void AddTable(const TTable &T)
Adds all the rows of the input table. Allows duplicate rows (not a union).
Definition: table.cpp:3975
void TTable::Count ( const TStr CountColName,
const TStr Col 
)

Counts number of unique elements.

Count the number of appearences of the different elements of column . Record results in column CountCol

Definition at line 1802 of file table.cpp.

1802  {
1803  TStrV GroupByAttrs;
1804  GroupByAttrs.Add(CountColName);
1805  Aggregate(GroupByAttrs, aaCount, "", Col);
1806 }
void Aggregate(const TStrV &GroupByAttrs, TAttrAggr AggOp, const TStr &ValAttr, const TStr &ResAttr, TBool Ordered=true)
Aggregates values of ValAttr after grouping with respect to GroupByAttrs. Result are stored as new at...
Definition: table.cpp:1585
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: table.h:257
void TTable::Defrag ( )

Releases memory of deleted rows, and defrags.

Also updates meta-data as row indices have changed Need some liveness analysis of columns

Definition at line 3311 of file table.cpp.

3311  {
3312  TInt FreeIndex = 0;
3313  TIntV Mapping; // Mapping[old_index] = new_index/invalid
3314 
3315  TInt IdColIdx = GetColIdx(IdColName);
3316 
3317  for (TInt i = 0; i < Next.Len(); i++) {
3318  if (Next[i] != TTable::Invalid) {
3319  // "first row" properly set beforehand
3320  if (FreeIndex == 0) {
3321  Assert (i == FirstValidRow);
3322  FirstValidRow = 0;
3323  }
3324 
3325  if (Next[i] != Last) {
3326  Next[FreeIndex] = FreeIndex + 1;
3327  Mapping.Add(FreeIndex);
3328  } else {
3329  Next[FreeIndex] = Last;
3330  LastValidRow = FreeIndex;
3331  Mapping.Add(Last);
3332  }
3333 
3334  RowIdMap.AddDat(IntCols[IdColIdx][i], FreeIndex);
3335 
3336  for (TInt j = 0; j < IntCols.Len(); j++) {
3337  IntCols[j][FreeIndex] = IntCols[j][i];
3338  }
3339  for (TInt j = 0; j < FltCols.Len(); j++) {
3340  FltCols[j][FreeIndex] = FltCols[j][i];
3341  }
3342  for (TInt j = 0; j < StrColMaps.Len(); j++) {
3343  StrColMaps[j][FreeIndex] = StrColMaps[j][i];
3344  }
3345 
3346  FreeIndex++;
3347  } else {
3348  NumRows--;
3349  Mapping.Add(TTable::Invalid);
3350  }
3351  }
3352 
3353  // should match, or bug somewhere
3355 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
#define Assert(Cond)
Definition: bd.h:251
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::DelColType ( const TStr ColName)
inlineprotected

Adds column with name ColName and type ColType to the ColTypeMap.

Definition at line 661 of file table.h.

661  {
662  TStr NColName = NormalizeColName(ColName);
663  ColTypeMap.DelKey(NColName);
664  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
void DelKey(const TKey &Key)
Definition: hash.h:404
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
TStr TTable::DenormalizeColName ( const TStr ColName) const
protected

Removes suffix to column name if exists.

Definition at line 4648 of file table.cpp.

4648  {
4649  TStr DColName = ColName;
4650  if (DColName.Len() == 0) { return DColName; }
4651  if (DColName.GetCh(0) == '_') { return DColName; }
4652  if (DColName.GetCh(DColName.Len()-2) == '-') {
4653  DColName = DColName.GetSubStr(0,DColName.Len()-3);
4654  }
4655  TInt Conflicts = 0;
4656  for (TInt i = 0; i < Sch.Len(); i++) {
4657  if (DColName == Sch[i].Val1.GetSubStr(0, Sch[i].Val1.Len()-3)) {
4658  Conflicts++;
4659  }
4660  }
4661  if (Conflicts > 1) { return ColName; }
4662  else { return DColName; }
4663 }
int Len() const
Definition: dt.h:487
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr GetSubStr(const int &BChN, const int &EChN) const
Definition: dt.cpp:811
char GetCh(const int &ChN) const
Definition: dt.h:483
Definition: dt.h:1134
Definition: dt.h:412
Schema TTable::DenormalizeSchema ( ) const
protected

Removes suffix to column names in the Schema.

Definition at line 4665 of file table.cpp.

4665  {
4666  Schema DSch;
4667  for (TInt i = 0; i < Sch.Len(); i++) {
4668  DSch.Add(TPair<TStr, TAttrType>(DenormalizeColName(Sch[i].Val1), Sch[i].Val2));
4669  }
4670  return DSch;
4671 }
TStr DenormalizeColName(const TStr &ColName) const
Removes suffix to column name if exists.
Definition: table.cpp:4648
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: dt.h:1134
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::Dump ( FILE *  OutF = stdout) const

Prints table contents to a text file.

Definition at line 887 of file table.cpp.

887  {
888  TInt L = Sch.Len();
889  Schema DSch = DenormalizeSchema();
890 
891  // LoadSS() will not throw away lines with #
892  //fprintf(OutF, "# Table: rows: %d, columns: %d\n", GetNumValidRows(), GetNodes());
893  // print title (schema), LoadSS() will take first line as (optional) schema
894  fprintf(OutF, "# ");
895  for (TInt i = 0; i < L-1; i++) {
896  fprintf(OutF, "%s\t", DSch[i].Val1.CStr());
897  }
898  fprintf(OutF, "%s\n", DSch[L-1].Val1.CStr());
899  // print table contents
900  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
901  for (TInt i = 0; i < L; i++) {
902  char C = (i == L-1) ? '\n' : '\t';
903  switch (GetSchemaColType(i)) {
904  case atInt: {
905  fprintf(OutF, "%d%c", RowI.GetIntAttr(GetSchemaColName(i)).Val, C);
906  break;
907  }
908  case atFlt: {
909  fprintf(OutF, "%f%c", RowI.GetFltAttr(GetSchemaColName(i)).Val, C);
910  break;
911  }
912  case atStr: {
913  fprintf(OutF, "%s%c", RowI.GetStrAttr(GetSchemaColName(i)).CStr(), C);
914  break;
915  }
916  }
917  }
918  }
919 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
TAttrType GetSchemaColType(TInt Idx) const
Gets type of the column with index Idx in the schema.
Definition: table.h:640
Schema DenormalizeSchema() const
Removes suffix to column names in the Schema.
Definition: table.cpp:4665
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
Definition: gbase.h:23
TRowIterator TTable::EndRI ( ) const
inline

Gets iterator to the last valid row of the table.

Definition at line 1243 of file table.h.

1243 { return TRowIterator(TTable::Last, this);}
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
friend class TRowIterator
Definition: table.h:1526
TRowIteratorWithRemove TTable::EndRIWR ( )
inline

Gets iterator with reomve to the last valid row.

Definition at line 1247 of file table.h.

1247 { return TRowIteratorWithRemove(TTable::Last, this);}
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
friend class TRowIteratorWithRemove
Definition: table.h:1527
void TTable::FillBucketsByInterval ( TStr  SplitAttr,
TIntPrV  SplitIntervals 
)
protected

Fills RowIdBuckets with sets of row ids.

Fill RowIdBuckets with sets of row ids, partitioned on the value of the column SplitAttr, according to the intervals specified by SplitIntervals. Called by ToVarGraphSequence and ToVarGraphSequenceIterator.

Definition at line 3599 of file table.cpp.

3599  {
3600  TInt SplitColId = GetColIdx(SplitAttr);
3601  int NumBuckets = SplitIntervals.Len();
3602  InitRowIdBuckets(NumBuckets);
3603 
3604  // populate RowIdSets by computing the range of buckets for each row
3605  for (TInt i = 0; i < Next.Len(); i++) {
3606  if (Next[i] == Invalid) { continue; }
3607  int SplitVal = IntCols[SplitColId][i];
3608  for (TInt j = 0; j < SplitIntervals.Len(); j++) {
3609  if (SplitVal >= SplitIntervals[j].Val1 && SplitVal < SplitIntervals[j].Val2) {
3610  RowIdBuckets[j].Add(i);
3611  }
3612  }
3613  }
3614 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
void InitRowIdBuckets(int NumBuckets)
Initializes the RowIdBuckets vector which will be used for the graph sequence creation.
Definition: table.cpp:3535
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::FillBucketsByWindow ( TStr  SplitAttr,
TInt  JumpSize,
TInt  WindowSize,
TInt  StartVal,
TInt  EndVal 
)
protected

Fills RowIdBuckets with sets of row ids.

Fill RowIdBuckets with sets of row ids partitioned on the value of the column SplitAttr, according to the windows specified by JumpSize and WindowSize. Called by ToGraphSequence and ToGraphSequenceIterator.

Definition at line 3547 of file table.cpp.

3547  {
3548  Assert (JumpSize <= WindowSize);
3549  int NumBuckets, MinBucket, MaxBucket;
3550  TInt SplitColId = GetColIdx(SplitAttr);
3551 
3552  if (StartVal == TInt::Mn || EndVal == TInt::Mx) {
3553  // calculate min and max value of the column 'SplitAttr'
3554  TInt MinValue = TInt::Mx;
3555  TInt MaxValue = TInt::Mn;
3556  for (TInt i = 0; i < Next.Len(); i++) {
3557  if (Next[i] != Invalid) {
3558  if (MinValue > IntCols[SplitColId][i]) {
3559  MinValue = IntCols[SplitColId][i];
3560  }
3561  if (MaxValue < IntCols[SplitColId][i]) {
3562  MaxValue = IntCols[SplitColId][i];
3563  }
3564  }
3565  }
3566 
3567  if (StartVal == TInt::Mn) StartVal = MinValue;
3568  if (EndVal == TInt::Mx) EndVal = MaxValue;
3569  }
3570 
3571  // initialize buckets
3572  NumBuckets = 1;
3573  if (JumpSize > 0) {
3574  NumBuckets = (EndVal - StartVal)/JumpSize + 1;
3575  }
3576 
3577  InitRowIdBuckets(NumBuckets);
3578 
3579  // populate RowIdSets by computing the range of buckets for each row
3580  for (TInt i = 0; i < Next.Len(); i++) {
3581  if (Next[i] == Invalid) { continue; }
3582  int SplitVal = IntCols[SplitColId][i];
3583  if (SplitVal < StartVal || SplitVal > EndVal) { continue; }
3584  int RowVal = SplitVal - StartVal;
3585  if (JumpSize == 0) { // expanding windows
3586  MinBucket = RowVal/WindowSize;
3587  MaxBucket = NumBuckets-1;
3588  } else if (JumpSize == WindowSize) { // disjoint windows
3589  MinBucket = MaxBucket = RowVal/JumpSize;
3590  } else { // sliding windows
3591  if (RowVal < WindowSize) { MinBucket = 0; }
3592  else { MinBucket = (RowVal-WindowSize)/JumpSize + 1; }
3593  MaxBucket = RowVal/JumpSize;
3594  }
3595  for (TInt j = MinBucket; j <= MaxBucket; j++) { RowIdBuckets[j].Add(i); }
3596  }
3597 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const int Mx
Definition: dt.h:1139
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
static const int Mn
Definition: dt.h:1138
#define Assert(Cond)
Definition: bd.h:251
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
void InitRowIdBuckets(int NumBuckets)
Initializes the RowIdBuckets vector which will be used for the graph sequence creation.
Definition: table.cpp:3535
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::GenerateColTypeMap ( THash< TStr, TPair< TInt, TInt > > &  ColTypeIntMap)
private

Definition at line 337 of file table.cpp.

337  {
338  ColTypeMap.Clr();
339  Sch.Clr();
340  for (THash<TStr,TPair<TInt,TInt> >::TIter it = ColTypeIntMap.BegI(); it < ColTypeIntMap.EndI(); it++) {
341  TPair<TInt,TInt> dat = it.GetDat();
342  switch (dat.GetVal1()) {
343  case 0:
344  AddColType(it.GetKey(), atInt, dat.GetVal2());
345  AddSchemaCol(it.GetKey(), atInt);
346  break;
347  case 1:
348  AddColType(it.GetKey(), atFlt, dat.GetVal2());
349  AddSchemaCol(it.GetKey(), atFlt);
350  break;
351  case 2:
352  AddColType(it.GetKey(), atStr, dat.GetVal2());
353  AddSchemaCol(it.GetKey(), atStr);
354  break;
355  }
356  }
357  IsNextDirty = 0;
358 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
Schema Sch
Table Schema.
Definition: table.h:549
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
const TVal1 & GetVal1() const
Definition: ds.h:60
TIter BegI() const
Definition: hash.h:213
const TVal2 & GetVal2() const
Definition: ds.h:61
TIter EndI() const
Definition: hash.h:218
Definition: gbase.h:23
void Clr(const bool &DoDel=true, const TSizeTy &NoDelLim=-1)
Clears the contents of the vector.
Definition: ds.h:1022
Definition: ds.h:32
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
Definition: hash.h:97
Definition: gbase.h:23
void Clr(const bool &DoDel=true, const int &NoDelLim=-1, const bool &ResetDat=true)
Definition: hash.h:361
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
Definition: gbase.h:23
TInt TTable::GetColIdx ( const TStr ColName) const
inline

Gets index of column ColName among columns of the same type in the schema.

Definition at line 1013 of file table.h.

1013  {
1014  TStr NColName = NormalizeColName(ColName);
1015  return ColTypeMap.IsKey(NColName) ? ColTypeMap.GetDat(NColName).Val2 : TInt(-1);
1016  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
bool IsKey(const TKey &Key) const
Definition: hash.h:258
void TTable::GetCollidingRows ( const TTable T,
THashSet< TInt > &  Collisions 
)
protected

Gets set of row ids of rows common with table T.

Definition at line 4014 of file table.cpp.

4014  {
4015  TIntV UniqueVec;
4017  TStrV GroupBy;
4018 
4019  // indices of columns of each type
4020  TIntV IntGroupByCols;
4021  TIntV FltGroupByCols;
4022  TIntV StrGroupByCols;
4023 
4024  TInt IKLen, FKLen, SKLen;
4025 
4026  // check that schemas match
4027  for (TInt c = 0; c < Sch.Len(); c++) {
4028  if (Sch[c].Val1 == IdColName) {
4029  if (Table.Sch[c].Val1 != Table.GetIdColName()) {
4030  TExcept::Throw("GetCollidingRows: schemas do not match!");
4031  }
4032  continue;
4033  }
4034  if (Sch[c] != Table.Sch[c]) {
4035  printf("(%s,%d) != (%s,%d)\n", Sch[c].Val1.CStr(), Sch[c].Val2, Table.Sch[c].Val1.CStr(), Table.Sch[c].Val2);
4036  TExcept::Throw("GetCollidingRows: schemas do not match!");
4037  }
4038  GroupBy.Add(NormalizeColName(Sch[c].Val1));
4039  TPair<TAttrType, TInt> ColType = Table.GetColTypeMap(Sch[c].Val1);
4040  switch (ColType.Val1) {
4041  case atInt:
4042  IntGroupByCols.Add(ColType.Val2);
4043  break;
4044  case atFlt:
4045  FltGroupByCols.Add(ColType.Val2);
4046  break;
4047  case atStr:
4048  StrGroupByCols.Add(ColType.Val2);
4049  break;
4050  }
4051  }
4052 
4053  IKLen = IntGroupByCols.Len();
4054  FKLen = FltGroupByCols.Len();
4055  SKLen = StrGroupByCols.Len();
4056 
4057  // group rows of first table
4058  GroupAux(GroupBy, Grouping, true, "", false, UniqueVec, true);
4059 
4060  // find colliding rows of second table
4061  for (TRowIterator it = Table.BegRI(); it < Table.EndRI(); it++) {
4062  // read keys from row
4063  TIntV IKey(IKLen + SKLen, 0);
4064  TFltV FKey(FKLen, 0);
4065 
4066  // find group key
4067  for (TInt c = 0; c < IKLen; c++) {
4068  IKey.Add(it.GetIntAttr(IntGroupByCols[c]));
4069  }
4070  for (TInt c = 0; c < FKLen; c++) {
4071  FKey.Add(it.GetFltAttr(FltGroupByCols[c]));
4072  }
4073  for (TInt c = 0; c < SKLen; c++) {
4074  IKey.Add(it.GetStrMapById(StrGroupByCols[c]));
4075  }
4076  // look for group matching the key
4077  TGroupKey GroupKey = TGroupKey(IKey, FKey);
4078 
4079  TInt RowIdx = it.GetRowIdx();
4080  if (Grouping.IsKey(GroupKey)) {
4081  // row exists in first table
4082  Collisions.AddKey(RowIdx);
4083  }
4084  }
4085 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
Definition: gbase.h:23
TPair< TIntV, TFltV > TGroupKey
Represents grouping key with IntV for integer and string attributes and FltV for float attributes...
Definition: table.h:145
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
int AddKey(const TKey &Key)
Definition: shash.h:1254
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: ds.h:32
Definition: hash.h:97
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TAttrType TTable::GetColType ( const TStr ColName) const
inline

Gets type of column ColName.

Definition at line 1227 of file table.h.

1227  {
1228  TStr NColName = NormalizeColName(ColName);
1229  return ColTypeMap.GetDat(NColName).Val1;
1230  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
TPair<TAttrType, TInt> TTable::GetColTypeMap ( const TStr ColName) const
inlineprotected

Gets column type and index of ColName.

Definition at line 666 of file table.h.

666  {
667  TStr NColName = NormalizeColName(ColName);
668  return ColTypeMap.GetDat(NColName);
669  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
TTableContext* TTable::GetContext ( )
inline

Returns the context.

Definition at line 1005 of file table.h.

1005  {
1006  return Context;
1007  }
TTableContext * Context
Execution Context.
Definition: table.h:545
const char* TTable::GetContextKey ( TInt  Val) const
inlineprotected

Gets the Key of the Context StringVals pool. Used by ToGraph method in conv.cpp.

Definition at line 622 of file table.h.

622  {
623  return Context->StringVals.GetKey(Val);
624  }
TTableContext * Context
Execution Context.
Definition: table.h:545
const char * GetKey(const int &KeyId) const
Definition: hash.h:893
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
TSize TTable::GetContextMemUsedKB ( )

Returns approximate memory used by table context in [KB].

Definition at line 3969 of file table.cpp.

3969  {
3970  TSize ApproxSize = 0;
3971  ApproxSize += Context->StringVals.GetMemUsed();
3972  return ApproxSize;
3973 }
::TSize GetMemUsed() const
Definition: hash.h:866
TTableContext * Context
Execution Context.
Definition: table.h:545
size_t TSize
Definition: bd.h:58
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
TStr TTable::GetDstCol ( ) const
inline

Gets the name of the column to be used as dst nodes in the graph.

Definition at line 1165 of file table.h.

1165 { return DstCol; }
TStr DstCol
Column (attribute) to serve as dst nodes when constructing the graph.
Definition: table.h:590
TStrV TTable::GetDstNodeFltAttrV ( ) const

Gets dst node float attribute name vector.

Definition at line 1049 of file table.cpp.

1049  {
1050  TStrV FltNA = TStrV(FltCols.Len(),0);
1051  for (TInt i = 0; i < DstNodeAttrV.Len(); i++) {
1052  TStr Attr = DstNodeAttrV[i];
1053  if (GetColType(Attr) == atFlt) {
1054  FltNA.Add(Attr);
1055  }
1056  }
1057  return FltNA;
1058 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TVec< TStr > TStrV
Definition: ds.h:1599
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetDstNodeIntAttrV ( ) const

Gets dst node int attribute name vector.

Definition at line 1016 of file table.cpp.

1016  {
1017  TStrV IntNA = TStrV(IntCols.Len(),0);
1018  for (TInt i = 0; i < DstNodeAttrV.Len(); i++) {
1019  TStr Attr = DstNodeAttrV[i];
1020  if (GetColType(Attr) == atInt) {
1021  IntNA.Add(Attr);
1022  }
1023  }
1024  return IntNA;
1025 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
Definition: dt.h:412
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetDstNodeStrAttrV ( ) const

Gets dst node str attribute name vector.

Definition at line 1082 of file table.cpp.

1082  {
1083  TStrV StrNA = TStrV(StrColMaps.Len(),0);
1084  for (TInt i = 0; i < DstNodeAttrV.Len(); i++) {
1085  TStr Attr = DstNodeAttrV[i];
1086  if (GetColType(Attr) == atStr) {
1087  StrNA.Add(Attr);
1088  }
1089  }
1090  return StrNA;
1091 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
TStrV DstNodeAttrV
List of columns (attributes) to serve as destination node attributes.
Definition: table.h:593
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetEdgeFltAttrV ( ) const

Gets edge float attribute name vector.

Definition at line 1060 of file table.cpp.

1060  {
1061  TStrV FltEA = TStrV(FltCols.Len(),0);;
1062  for (TInt i = 0; i < EdgeAttrV.Len(); i++) {
1063  TStr Attr = EdgeAttrV[i];
1064  if (GetColType(Attr) == atFlt) {
1065  FltEA.Add(Attr);
1066  }
1067  }
1068  return FltEA;
1069 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetEdgeIntAttrV ( ) const

Gets edge int attribute name vector.

Definition at line 1027 of file table.cpp.

1027  {
1028  TStrV IntEA = TStrV(IntCols.Len(),0);
1029  for (TInt i = 0; i < EdgeAttrV.Len(); i++) {
1030  TStr Attr = EdgeAttrV[i];
1031  if (GetColType(Attr) == atInt) {
1032  IntEA.Add(Attr);
1033  }
1034  }
1035  return IntEA;
1036 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetEdgeStrAttrV ( ) const

Gets edge str attribute name vector.

Definition at line 1094 of file table.cpp.

1094  {
1095  TStrV StrEA = TStrV(StrColMaps.Len(),0);
1096  for (TInt i = 0; i < EdgeAttrV.Len(); i++) {
1097  TStr Attr = EdgeAttrV[i];
1098  if (GetColType(Attr) == atStr) {
1099  StrEA.Add(Attr);
1100  }
1101  }
1102  return StrEA;
1103 }
TStrV EdgeAttrV
List of columns (attributes) to serve as edge attributes.
Definition: table.h:591
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
PTable TTable::GetEdgeTable ( const PNEANet Network,
TTableContext Context 
)
static

Extracts edge TTable from PNEANet.

Definition at line 3741 of file table.cpp.

3741  {
3742  Schema SR;
3743  SR.Add(TPair<TStr,TAttrType>("edg_id",atInt));
3744  SR.Add(TPair<TStr,TAttrType>("src_id",atInt));
3745  SR.Add(TPair<TStr,TAttrType>("dst_id",atInt));
3746 
3747  TStrV IntAttrNames;
3748  TStrV FltAttrNames;
3749  TStrV StrAttrNames;
3750 
3751  TNEANet::TEdgeI EdgeI = Network->BegEI();
3752  EdgeI.GetIntAttrNames(IntAttrNames);
3753  EdgeI.GetFltAttrNames(FltAttrNames);
3754  EdgeI.GetStrAttrNames(StrAttrNames);
3755  for (TInt i = 0; i < IntAttrNames.Len(); i++) {
3756  SR.Add(TPair<TStr,TAttrType>(IntAttrNames[i],atInt));
3757  }
3758  for (TInt i = 0; i < FltAttrNames.Len(); i++) {
3759  SR.Add(TPair<TStr,TAttrType>(FltAttrNames[i],atFlt));
3760  }
3761  for (TInt i = 0; i < StrAttrNames.Len(); i++) {
3762  //printf("%s\n",StrAttrNames[i].CStr());
3763  SR.Add(TPair<TStr,TAttrType>(StrAttrNames[i],atStr));
3764  }
3765 
3766  PTable T = New(SR, Context);
3767 
3768  TInt Cnt = 0;
3769  // populate table columns
3770  while (EdgeI < Network->EndEI()) {
3771  T->IntCols[0].Add(EdgeI.GetId());
3772  T->IntCols[1].Add(EdgeI.GetSrcNId());
3773  T->IntCols[2].Add(EdgeI.GetDstNId());
3774  for (TInt i = 0; i < IntAttrNames.Len(); i++) {
3775  T->IntCols[i+3].Add(Network->GetIntAttrDatE(EdgeI,IntAttrNames[i]));
3776  }
3777  for (TInt i = 0; i < FltAttrNames.Len(); i++) {
3778  T->FltCols[i].Add(Network->GetFltAttrDatE(EdgeI,FltAttrNames[i]));
3779  }
3780  for (TInt i = 0; i < StrAttrNames.Len(); i++) {
3781  T->AddStrVal(i, Network->GetStrAttrDatE(EdgeI,StrAttrNames[i]));
3782  }
3783  Cnt++;
3784  EdgeI++;
3785  }
3786  // set number of rows and "Next" vector
3787  T->NumRows = Cnt;
3788  T->NumValidRows = T->NumRows;
3789  T->Next = TIntV(T->NumRows,0);
3790  for (TInt i = 0; i < T->NumRows-1; i++) {
3791  T->Next.Add(i+1);
3792  }
3793  T->LastValidRow = T->NumRows-1;
3794  T->Next.Add(Last);
3795  return T;
3796 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void GetStrAttrNames(TStrV &Names) const
Gets vector of str attribute names.
Definition: network.h:1900
int GetId() const
Returns edge ID.
Definition: network.h:1882
Definition: gbase.h:23
int GetDstNId() const
Returns the destination of the edge.
Definition: network.h:1886
int GetSrcNId() const
Returns the source of the edge.
Definition: network.h:1884
void GetFltAttrNames(TStrV &Names) const
Gets vector of flt attribute names.
Definition: network.h:1904
Definition: dt.h:1134
Edge iterator. Only forward iteration (operator++) is supported.
Definition: network.h:1867
Definition: gbase.h:23
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: bd.h:196
void GetIntAttrNames(TStrV &Names) const
Gets vector of int attribute names.
Definition: network.h:1892
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
PTable TTable::GetEdgeTablePN ( const PNGraphMP Network,
TTableContext Context 
)
static

Extracts edge TTable from parallel graph PNGraphMP.

Definition at line 3799 of file table.cpp.

3799  {
3800  Schema SR;
3801  SR.Add(TPair<TStr,TAttrType>("src_id",atInt));
3802  SR.Add(TPair<TStr,TAttrType>("dst_id",atInt));
3803 
3804  TNGraphMP::TEdgeI FirstEI = Network->BegEI();
3805  PTable T = New(SR, Context);
3806  TInt NumEdges = Network->GetEdges();
3807  TInt NumPartitions = omp_get_max_threads()*CHUNKS_PER_THREAD;
3808  TInt PartitionSize = NumEdges/NumPartitions;
3809  if (PartitionSize*NumPartitions < NumEdges) { NumPartitions++;}
3810 
3812  TVec<TEIPr> Partitions;
3813  TIntV PartitionSizes;
3814  TNGraphMP::TEdgeI currStart = FirstEI;
3815  TInt currCount = 0;
3816  while (FirstEI < Network->EndEI()){
3817  if (currCount == PartitionSize) {
3818  Partitions.Add(TEIPr(currStart, FirstEI));
3819  currStart = FirstEI;
3820  PartitionSizes.Add(currCount);
3821  //printf("added: %d\n", currCount.Val);
3822  currCount = 0;
3823  }
3824  //printf("%d\n", currCount.Val);
3825  FirstEI++;
3826  currCount++;
3827  }
3828  Partitions.Add(TEIPr(currStart, FirstEI));
3829  PartitionSizes.Add(currCount);
3830 
3831  T->ResizeTable(NumEdges);
3832  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
3833  for (int p = 0; p < Partitions.Len(); p++) {
3834  TNGraphMP::TEdgeI EdgeI = Partitions[p].GetVal1();
3835  TNGraphMP::TEdgeI EndI = Partitions[p].GetVal2();
3836  //printf("Thread = %d, p = %d, size = %d\n", omp_get_thread_num(), p, PartitionSizes[p].Val);
3837  int start = T->GetEmptyRowsStart(PartitionSizes[p]);
3838  while (EdgeI < EndI) {
3839  T->IntCols[0][start] = EdgeI.GetSrcNId();
3840  T->IntCols[1][start] = EdgeI.GetDstNId();
3841  EdgeI++;
3842  if (EdgeI < EndI) { T->Next[start] = start+1;}
3843  start++;
3844  }
3845  }
3846 
3847  Assert(T->NumRows == NumEdges);
3848  return T;
3849 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: gbase.h:23
int GetSrcNId() const
Gets the source node of an edge.
Definition: graphmp.h:119
#define Assert(Cond)
Definition: bd.h:251
int GetDstNId() const
Gets destination node of an edge.
Definition: graphmp.h:121
Edge iterator. Only forward iteration (operator++) is supported.
Definition: graphmp.h:102
Definition: dt.h:1134
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
int TTable::GetEmptyRowsStart ( int  NewRows)
protected

Gets the start index to a chunk of empty rows of size NewRows.

Definition at line 4376 of file table.cpp.

4376  {
4377  int start = -1;
4378 #ifdef USE_OPENMP
4379  #pragma omp critical
4380  {
4381 #endif
4382  start = NumRows;
4383  NumRows += NewRows;
4384  NumValidRows += NewRows;
4385  // To make this function thread-safe, the following call must be done before the
4386  // code enters parallel region.
4387  // ResizeTable(NumRows);
4388  Assert(NumRows <= Next.Len());
4389  if (LastValidRow >= 0) {Next[LastValidRow] = start;}
4390  LastValidRow = start+NewRows-1;
4391  Next[LastValidRow] = Last;
4392 #ifdef USE_OPENMP
4393  }
4394 #endif
4395  Assert (start >= 0);
4396  return start;
4397 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
#define Assert(Cond)
Definition: bd.h:251
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
PNEANet TTable::GetFirstGraphFromSequence ( TAttrAggr  AggrPolicy)
protected

Returns the first graph of the sequence.

Return the first graph of the sequence corresponding to the sets of row ids in RowIdBuckets. This is used by the ToGraph*Iterator functions.

Definition at line 3628 of file table.cpp.

3628  {
3629  CurrBucket = -1;
3630  this->AggrPolicy = AggrPolicy;
3631  return GetNextGraphFromSequence();
3632 }
TInt CurrBucket
Current row id bucket - used when generating a sequence of graphs using an iterator.
Definition: table.h:600
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
PNEANet GetNextGraphFromSequence()
Returns the next graph in sequence corresponding to RowIdBuckets.
Definition: table.cpp:3634
PTable TTable::GetFltNodePropertyTable ( const PNEANet Network,
const TIntFltH Property,
const TStr NodeAttrName,
const TAttrType NodeAttrType,
const TStr PropertyAttrName,
TTableContext Context 
)
static

Extracts node and edge property TTables from THash.

Definition at line 3852 of file table.cpp.

3854  {
3855  Schema SR;
3856  // Determine type of node id
3857  SR.Add(TPair<TStr,TAttrType>(NodeAttrName,NodeAttrType));
3858  SR.Add(TPair<TStr,TAttrType>(PropertyAttrName,atFlt));
3859  PTable T = New(SR, Context);
3860  TInt NodeColIdx = T->GetColIdx(NodeAttrName);
3861  TInt Cnt = 0;
3862  // populate table columns
3863  for (TNEANet::TNodeI NodeI = Network->BegNI(); NodeI < Network->EndNI(); NodeI++) {
3864  switch (NodeAttrType) {
3865  case atInt:
3866  T->IntCols[NodeColIdx].Add(Network->GetIntAttrDatN(NodeI,NodeAttrName));
3867  break;
3868  case atFlt:
3869  T->FltCols[NodeColIdx].Add(Network->GetFltAttrDatN(NodeI,NodeAttrName));
3870  break;
3871  case atStr:
3872  T->AddStrVal(TInt(0), Network->GetStrAttrDatN(NodeI,NodeAttrName));
3873  break;
3874  }
3875  T->FltCols[0].Add(Property.GetDat(NodeI.GetId()));
3876  Cnt++;
3877  }
3878  // set number of rows and "Next" vector
3879  T->NumRows = Cnt;
3880  T->NumValidRows = T->NumRows;
3881  T->Next = TIntV(T->NumRows,0);
3882  for (TInt i = 0; i < T->NumRows-1; i++) {
3883  T->Next.Add(i+1);
3884  }
3885  T->LastValidRow = T->NumRows-1;
3886  T->Next.Add(Last);
3887  return T;
3888 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
Node iterator. Only forward iteration (operator++) is supported.
Definition: network.h:1792
Definition: gbase.h:23
Definition: dt.h:1134
Definition: gbase.h:23
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: bd.h:196
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
TIntV TTable::GetFltRowIdxByVal ( const TStr ColName,
const TFlt Val 
) const

Gets the rows containing Val in flt column ColName.

Returns the RowIdxs in the float column given by ColName which have value Val, as a Vector. (If no such value is found, returns an empty vector.) Uses an index created by RequestIndex method if it exists, else loops over the entire table (which can be slow, so it is recommended to request an index if multiple queries must be made).

Definition at line 5453 of file table.cpp.

5453  {
5454 
5455  if (FltColIndexes.IsKey(ColName)) {
5456  THash<TFlt, TIntV> ColIndex = FltColIndexes.GetDat(ColName);
5457  if (ColIndex.IsKey(Val)) {
5458  return ColIndex.GetDat(Val);
5459  }
5460  else {
5461  TIntV Empty;
5462  return Empty;
5463  }
5464  }
5465 
5466  TIntV ToReturn;
5467  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5468  TFlt ValAtRow = RowI.GetFltAttr(ColName);
5469  if ( Val == ValAtRow) {
5470  ToReturn.Add(RowI.GetRowIdx());
5471  }
5472  }
5473  return ToReturn;
5474 }
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
THash< TStr, THash< TFlt, TIntV > > FltColIndexes
Indexes for Float Columns.
Definition: table.h:570
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: hash.h:97
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TFlt TTable::GetFltVal ( const TStr ColName,
const TInt RowIdx 
)
inline

Gets the value of float attribute ColName at row RowIdx.

Definition at line 1024 of file table.h.

1024  {
1025  return FltCols[GetColIdx(ColName)][RowIdx];
1026  }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TFlt TTable::GetFltValAtRowIdx ( const TInt ColIdx,
const TInt RowIdx 
)
inline

Get the float value at column ColIdx and row RowIdx.

Definition at line 1120 of file table.h.

1120  {
1121  return FltCols[ColIdx][RowIdx];
1122  }
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TVec< PNEANet > TTable::GetGraphsFromSequence ( TAttrAggr  AggrPolicy)
protected

Returns a sequence of graphs.

Return a sequence of graphs, each constructed from the set of row ids corresponding to a particular bucket in RowIdBuckets.

Definition at line 3616 of file table.cpp.

3616  {
3617  //call BuildGraph on each row id set - parallelizable!
3618  TVec<PNEANet> GraphSequence;
3619  for (TInt i = 0; i < RowIdBuckets.Len(); i++) {
3620  if (RowIdBuckets[i].Len() == 0) { continue; }
3622  GraphSequence.Add(PNet);
3623  }
3624 
3625  return GraphSequence;
3626 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
PNEANet BuildGraph(const TIntV &RowIds, TAttrAggr AggrPolicy)
Makes a single pass over the rows in the given row id set, and creates nodes, edges, assigns node and edge attributes.
Definition: table.cpp:3445
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
Definition: dt.h:1134
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
TStr TTable::GetIdColName ( ) const
inlineprotected

Gets name of the id column of this table.

Definition at line 636 of file table.h.

636 { return IdColName; }
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TIntV TTable::GetIntRowIdxByVal ( const TStr ColName,
const TInt Val 
) const

Gets the rows containing Val in int column ColName.

Returns the RowIdxs in the integer column given by ColName which have value Val, as a Vector. (If no such value is found, returns an empty vector.) Uses an index created by RequestIndex method if it exists, else loops over the entire table (which can be slow, so it is recommended to request an index if multiple queries must be made).

Definition at line 5410 of file table.cpp.

5410  {
5411 
5412  if (IntColIndexes.IsKey(ColName)) {
5413  THash<TInt, TIntV> ColIndex = IntColIndexes.GetDat(ColName);
5414  if (ColIndex.IsKey(Val)) {
5415  return ColIndex.GetDat(Val);
5416  }
5417  else {
5418  TIntV Empty;
5419  return Empty;
5420  }
5421  }
5422  TIntV ToReturn;
5423  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5424  TInt ValAtRow = RowI.GetIntAttr(ColName);
5425  if ( Val == ValAtRow) {
5426  ToReturn.Add(RowI.GetRowIdx());
5427  }
5428  }
5429  return ToReturn;
5430 }
THash< TStr, THash< TInt, TIntV > > IntColIndexes
Indexes for Int Columns.
Definition: table.h:568
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TInt TTable::GetIntVal ( const TStr ColName,
const TInt RowIdx 
)
inline

Gets the value of integer attribute ColName at row RowIdx.

Definition at line 1020 of file table.h.

1020  {
1021  return IntCols[GetColIdx(ColName)][RowIdx];
1022  }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TInt TTable::GetIntValAtRowIdx ( const TInt ColIdx,
const TInt RowIdx 
)
inline

Get the integer value at column ColIdx and row RowIdx.

Definition at line 1116 of file table.h.

1116  {
1117  return IntCols[ColIdx][RowIdx];
1118  }
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TInt TTable::GetLastValidRowIdx ( )
protected

Gets the id of the last valid row of the table.

TSize TTable::GetMemUsedKB ( )

Returns approximate memory used by table in [KB].

Definition at line 3940 of file table.cpp.

3940  {
3941  TSize ApproxSize = 0;
3942  ApproxSize += Next.GetMemUsed()/1000; // Next vector
3943  for(int i = 0; i < IntCols.Len(); i++){
3944  ApproxSize += IntCols[i].GetMemUsed()/1000;
3945  }
3946  for(int i = 0; i < FltCols.Len(); i++){
3947  ApproxSize += FltCols[i].GetMemUsed()/1000;
3948  }
3949  for(int i = 0; i < StrColMaps.Len(); i++){
3950  ApproxSize += StrColMaps[i].GetMemUsed()/1000;
3951  }
3952  ApproxSize += RowIdMap.GetMemUsed()/1000;
3953  ApproxSize += GroupIDMapping.GetMemUsed()/1000;
3954  ApproxSize += GroupMapping.GetMemUsed()/1000;
3955  ApproxSize += RowIdBuckets.GetMemUsed() / 1000;
3956  return ApproxSize;
3957 }
THash< GroupStmt, THash< TGroupKey, TIntV > > GroupMapping
Maps grouping statements to their (group-by key –> group id) mapping.
Definition: table.h:581
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TSizeTy GetMemUsed() const
Returns the memory footprint (the number of bytes) of the vector.
Definition: ds.h:511
size_t TSize
Definition: bd.h:58
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
::TSize GetMemUsed() const
Definition: hash.h:201
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
THash< GroupStmt, THash< TInt, TGroupKey > > GroupIDMapping
Maps grouping statements to their (group id –> group-by key) mapping.
Definition: table.h:577
static TInt TTable::GetMP ( )
inlinestatic

Definition at line 527 of file table.h.

527 { return UseMP; }
static TInt UseMP
Global switch for choosing multi-threaded versions of TTable functions.
Definition: table.h:489
PNEANet TTable::GetNextGraphFromSequence ( )
protected

Returns the next graph in sequence corresponding to RowIdBuckets.

Returns the next graph in sequence corresponding to RowIdBuckets. This is used to iterate over the graph sequence by constructing one graph at a time. Called by NextGraphIterator().

Definition at line 3634 of file table.cpp.

3634  {
3635  CurrBucket++;
3636  while (CurrBucket < RowIdBuckets.Len() && RowIdBuckets[CurrBucket].Len() == 0) {
3637  CurrBucket++;
3638  }
3639  if (CurrBucket >= RowIdBuckets.Len()) { return NULL; }
3641 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
PNEANet BuildGraph(const TIntV &RowIds, TAttrAggr AggrPolicy)
Makes a single pass over the rows in the given row id set, and creates nodes, edges, assigns node and edge attributes.
Definition: table.cpp:3445
TInt CurrBucket
Current row id bucket - used when generating a sequence of graphs using an iterator.
Definition: table.h:600
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
PTable TTable::GetNodeTable ( const PNEANet Network,
TTableContext Context 
)
static

Extracts node TTable from PNEANet.

Definition at line 3689 of file table.cpp.

3689  {
3690  Schema SR;
3691  SR.Add(TPair<TStr,TAttrType>("node_id",atInt));
3692 
3693  TStrV IntAttrNames;
3694  TStrV FltAttrNames;
3695  TStrV StrAttrNames;
3696 
3697  TNEANet::TNodeI NodeI = Network->BegNI();
3698  NodeI.GetIntAttrNames(IntAttrNames);
3699  NodeI.GetFltAttrNames(FltAttrNames);
3700  NodeI.GetStrAttrNames(StrAttrNames);
3701  for (TInt i = 0; i < IntAttrNames.Len(); i++) {
3702  SR.Add(TPair<TStr,TAttrType>(IntAttrNames[i],atInt));
3703  }
3704  for (TInt i = 0; i < FltAttrNames.Len(); i++) {
3705  SR.Add(TPair<TStr,TAttrType>(FltAttrNames[i],atFlt));
3706  }
3707  for (TInt i = 0; i < StrAttrNames.Len(); i++) {
3708  SR.Add(TPair<TStr,TAttrType>(StrAttrNames[i],atStr));
3709  }
3710 
3711  PTable T = New(SR, Context);
3712 
3713  TInt Cnt = 0;
3714  // populate table columns
3715  while (NodeI < Network->EndNI()) {
3716  T->IntCols[0].Add(NodeI.GetId());
3717  for (TInt i = 0; i < IntAttrNames.Len(); i++) {
3718  T->IntCols[i+1].Add(Network->GetIntAttrDatN(NodeI,IntAttrNames[i]));
3719  }
3720  for (TInt i = 0; i < FltAttrNames.Len(); i++) {
3721  T->FltCols[i].Add(Network->GetFltAttrDatN(NodeI,FltAttrNames[i]));
3722  }
3723  for (TInt i = 0; i < StrAttrNames.Len(); i++) {
3724  T->AddStrVal(i, Network->GetStrAttrDatN(NodeI,StrAttrNames[i]));
3725  }
3726  Cnt++;
3727  NodeI++;
3728  }
3729  // set number of rows and "Next" vector
3730  T->NumRows = Cnt;
3731  T->NumValidRows = T->NumRows;
3732  T->Next = TIntV(T->NumRows,0);
3733  for (TInt i = 0; i < T->NumRows-1; i++) {
3734  T->Next.Add(i+1);
3735  }
3736  T->LastValidRow = T->NumRows-1;
3737  T->Next.Add(Last);
3738  return T;
3739 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Node iterator. Only forward iteration (operator++) is supported.
Definition: network.h:1792
Definition: gbase.h:23
int GetId() const
Returns ID of the current node.
Definition: network.h:1807
Definition: dt.h:1134
void GetIntAttrNames(TStrV &Names) const
Gets vector of int attribute names.
Definition: network.h:1849
Definition: gbase.h:23
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: bd.h:196
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void GetFltAttrNames(TStrV &Names) const
Gets vector of flt attribute names.
Definition: network.h:1861
static PTable New()
Definition: table.h:932
void GetStrAttrNames(TStrV &Names) const
Gets vector of str attribute names.
Definition: network.h:1857
TInt TTable::GetNumRows ( ) const
inline

Gets total number of rows in this table.

Definition at line 1232 of file table.h.

1232 { return NumRows;}
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt TTable::GetNumValidRows ( ) const
inline

Gets number of valid, i.e. not deleted, rows in this table.

Definition at line 1234 of file table.h.

1234 { return NumValidRows;}
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void TTable::GetPartitionRanges ( TIntPrV Partitions,
TInt  NumPartitions 
) const

Partitions the table into NumPartitions and populate Partitions with the ranges.

Definition at line 1177 of file table.cpp.

1177  {
1178  TInt PartitionSize = NumValidRows / (NumPartitions);
1179  if (NumValidRows % NumPartitions != 0) PartitionSize++;
1180  if (PartitionSize < 10) {
1181  PartitionSize = 10;
1182  NumPartitions = NumValidRows / PartitionSize;
1183  }
1184  Partitions.Reserve(NumPartitions+1);
1185 
1186  TInt currRow = FirstValidRow;
1187  TInt currStart = currRow;
1188  if (IsNextDirty) {
1189  TInt currCount = PartitionSize;
1190  while (currRow != TTable::Last) {
1191  if (currCount == 0) {
1192  Partitions.Add(TIntPr(currStart, currRow));
1193  currStart = currRow;
1194  currCount = PartitionSize;
1195  }
1196  currRow = Next[currRow];
1197  currCount--;
1198  }
1199  Partitions.Add(TIntPr(currStart, currRow));
1200  } else {
1201  // Optimize for the case when rows are logically in sequence.
1202  currRow += PartitionSize;
1203  while (currRow != TTable::Last && currRow < Next.Len()) {
1204  if (Next[currRow] == TTable::Invalid) { currRow++; continue; }
1205  Partitions.Add(TIntPr(currStart, currRow));
1206  currStart = currRow;
1207  currRow += PartitionSize;
1208  }
1209  Partitions.Add(TIntPr(currStart, TTable::Last));
1210  }
1211  //printf("Num partitions: %d\n", Partitions.Len());
1212 }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void Reserve(const TSizeTy &_MxVals)
Reserves enough memory for the vector to store _MxVals elements.
Definition: ds.h:543
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TInt TTable::GetPivot ( TIntV V,
TInt  StartIdx,
TInt  EndIdx,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc 
)
protected

Gets pivot element for QSort.

Definition at line 3110 of file table.cpp.

3110  {
3111  TInt L = EndIdx - StartIdx + 1;
3112  const TInt Idx1 = StartIdx + TInt::GetRnd(L);
3113  const TInt Idx2 = StartIdx + TInt::GetRnd(L);
3114  const TInt Idx3 = StartIdx + TInt::GetRnd(L);
3115  if (CompareRows(V[Idx1], V[Idx2], SortByTypes, SortByIndices, Asc) < 0) {
3116  if (CompareRows(V[Idx2], V[Idx3], SortByTypes, SortByIndices, Asc) < 0) { return Idx2; }
3117  if (CompareRows(V[Idx1], V[Idx3], SortByTypes, SortByIndices, Asc) < 0) { return Idx3; }
3118  return Idx1;
3119  } else {
3120  if (CompareRows(V[Idx3], V[Idx2], SortByTypes, SortByIndices, Asc) < 0) { return Idx2; }
3121  if (CompareRows(V[Idx3], V[Idx1], SortByTypes, SortByIndices, Asc) < 0) { return Idx3; }
3122  return Idx1;
3123  }
3124 }
Definition: dt.h:1134
static int GetRnd(const int &Range=0)
Definition: dt.h:1175
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
TInt TTable::GetPivotKeyVal ( TIntV Key,
TIntV Val,
TInt  Start,
TInt  End 
)
staticprotected

Definition at line 5338 of file table.cpp.

5338  {
5339  TInt L = End - Start + 1;
5340  const TInt Idx1 = Start + TInt::GetRnd(L);
5341  const TInt Idx2 = Start + TInt::GetRnd(L);
5342  const TInt Idx3 = Start + TInt::GetRnd(L);
5343  if (CompareKeyVal(Key[Idx1], Val[Idx1], Key[Idx2], Val[Idx2]) < 0) {
5344  if (CompareKeyVal(Key[Idx2], Val[Idx2], Key[Idx3], Val[Idx3]) < 0) { return Idx2; }
5345  if (CompareKeyVal(Key[Idx1], Val[Idx1], Key[Idx3], Val[Idx3]) < 0) { return Idx3; }
5346  return Idx1;
5347  } else {
5348  if (CompareKeyVal(Key[Idx3], Val[Idx3], Key[Idx2], Val[Idx2]) < 0) { return Idx2; }
5349  if (CompareKeyVal(Key[Idx3], Val[Idx3], Key[Idx1], Val[Idx1]) < 0) { return Idx3; }
5350  return Idx1;
5351  }
5352 }
static TInt CompareKeyVal(const TInt &K1, const TInt &V1, const TInt &K2, const TInt &V2)
Definition: table.cpp:5297
Definition: dt.h:1134
static int GetRnd(const int &Range=0)
Definition: dt.h:1175
THash<TInt, TInt> TTable::GetRowIdMap ( ) const
inline

Gets a map of logical to physical row ids.

Definition at line 1237 of file table.h.

1237 { return RowIdMap;}
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
void TTable::GetSchema ( const TStr InFNm,
Schema S,
const char &  Separator = '\t' 
)
static

Returns pointer to a new table created from given Table, with name set to TableName.

Automatically detects the Schema of a input file (data is assumed to be in tsv format)

Definition at line 455 of file table.cpp.

455  {
456  // Determine Attr Type
457  // Assume that the data is tab separated
458  TSsParser Ss(InFNm, '\t', false, false, false);
459  TInt rowsToPeek = 1000;
460  TInt currRow = 0;
461  TInt lastComment = 0;
462  while (Ss.Next()) {
463  if (Ss.IsCmt()) {
464  lastComment += 1;
465  }
466  else break;
467  }
468  if (Ss.Eof()) {TExcept::Throw("No Data to determine attribute types!");}
469  TInt numCols = Ss.GetFlds();
470  TVec<TAttrType> colAttrV(numCols);
471  colAttrV.PutAll(atInt);
472  while (true) {
473  for (TInt i = 0; i < numCols; i++) {
474  if (Ss.IsInt(i)) {
475  }
476  else if (Ss.IsFlt(i)) {
477  colAttrV[i] = atFlt;
478  }
479  else {
480  colAttrV[i] = atStr;
481  }
482  }
483  currRow++;
484  if (currRow > rowsToPeek || Ss.Eof()) break;
485  Ss.Next();
486  }
487  // Default Separator is tab
488  TSsParser SsNames(InFNm, Separator, false, false, false);
489  for (int i = 0; i < lastComment; i++) { SsNames.Next();}
490  TVec<TStr> attrV;
491  TStr first(SsNames[0]);
492  int begin = 0;
493  TStr comment('#');
494  if (first != comment) {
495  for (int i = 1; i < first.Len(); i++){
496  if (first[i] != ' ') { begin = i; break;}
497  }
498  attrV.Add(first.GetSubStr(begin));
499  }
500  for (int i = 1; i < SsNames.GetFlds(); i++) {attrV.Add(SsNames[i]);}
501  for (TInt i = 0; i < numCols; i++) {
502  S.Add(TPair<TStr,TAttrType>(attrV[i],colAttrV[i]));
503  }
504 }
Definition: ss.h:72
Definition: gbase.h:23
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
Definition: dt.h:1134
Definition: dt.h:412
Definition: gbase.h:23
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
Schema TTable::GetSchema ( )
inline

Gets the schema of this table.

Definition at line 1125 of file table.h.

1125 { return DenormalizeSchema(); }
Schema DenormalizeSchema() const
Removes suffix to column names in the Schema.
Definition: table.cpp:4665
TStr TTable::GetSchemaColName ( TInt  Idx) const
inlineprotected

Gets name of the column with index Idx in the schema.

Definition at line 638 of file table.h.

638 { return Sch[Idx].Val1; }
Schema Sch
Table Schema.
Definition: table.h:549
TAttrType TTable::GetSchemaColType ( TInt  Idx) const
inlineprotected

Gets type of the column with index Idx in the schema.

Definition at line 640 of file table.h.

640 { return Sch[Idx].Val2; }
Schema Sch
Table Schema.
Definition: table.h:549
TStr TTable::GetSrcCol ( ) const
inline

Gets the name of the column to be used as src nodes in the graph.

Definition at line 1158 of file table.h.

1158 { return SrcCol; }
TStr SrcCol
Column (attribute) to serve as src nodes when constructing the graph.
Definition: table.h:589
TStrV TTable::GetSrcNodeFltAttrV ( ) const

Gets src node float attribute name vector.

Definition at line 1038 of file table.cpp.

1038  {
1039  TStrV FltNA = TStrV(FltCols.Len(),0);
1040  for (TInt i = 0; i < SrcNodeAttrV.Len(); i++) {
1041  TStr Attr = SrcNodeAttrV[i];
1042  if (GetColType(Attr) == atFlt) {
1043  FltNA.Add(Attr);
1044  }
1045  }
1046  return FltNA;
1047 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetSrcNodeIntAttrV ( ) const

Gets src node int attribute name vector.

Definition at line 1005 of file table.cpp.

1005  {
1006  TStrV IntNA = TStrV(IntCols.Len(),0);
1007  for (TInt i = 0; i < SrcNodeAttrV.Len(); i++) {
1008  TStr Attr = SrcNodeAttrV[i];
1009  if (GetColType(Attr) == atInt) {
1010  IntNA.Add(Attr);
1011  }
1012  }
1013  return IntNA;
1014 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStrV TTable::GetSrcNodeStrAttrV ( ) const

Gets src node str attribute name vector.

Definition at line 1071 of file table.cpp.

1071  {
1072  TStrV StrNA = TStrV(StrColMaps.Len(),0);
1073  for (TInt i = 0; i < SrcNodeAttrV.Len(); i++) {
1074  TStr Attr = SrcNodeAttrV[i];
1075  if (GetColType(Attr) == atStr) {
1076  StrNA.Add(Attr);
1077  }
1078  }
1079  return StrNA;
1080 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrV SrcNodeAttrV
List of columns (attributes) to serve as source node attributes.
Definition: table.h:592
Definition: dt.h:1134
TVec< TStr > TStrV
Definition: ds.h:1599
Definition: dt.h:412
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStr TTable::GetStr ( const TInt KeyId) const
inline

Gets the string with KeyId.

Definition at line 1109 of file table.h.

1109  {
1110  return Context->StringVals.GetKey(KeyId);
1111  }
TTableContext * Context
Execution Context.
Definition: table.h:545
const char * GetKey(const int &KeyId) const
Definition: hash.h:893
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
TInt TTable::GetStrMapById ( TInt  ColIdx,
TInt  RowIdx 
) const
inline

Gets the integer mapping of the string at column ColIdx at row RowIdx.

Definition at line 1033 of file table.h.

1033  {
1034  return StrColMaps[ColIdx][RowIdx];
1035  }
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TInt TTable::GetStrMapByName ( const TStr ColName,
TInt  RowIdx 
) const
inline

Gets the integer mapping of the string at column ColName at row RowIdx.

Definition at line 1038 of file table.h.

1038  {
1039  return StrColMaps[GetColIdx(ColName)][RowIdx];
1040  }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TIntV TTable::GetStrRowIdxByMap ( const TStr ColName,
const TInt Map 
) const

Gets the rows containing int mapping Map in str column ColName.

Returns the RowIdxs in the string column given by ColName which have the string with integer mapping Map, as a Vector. (If no such value is found, returns an empty vector.) Uses an index created by RequestIndex method if it exists, else loops over the entire table (which can be slow, so it is recommended to request an index if multiple queries must be made).

Definition at line 5431 of file table.cpp.

5431  {
5432 
5433  if (StrMapColIndexes.IsKey(ColName)) {
5434  THash<TInt, TIntV> ColIndex = StrMapColIndexes.GetDat(ColName);
5435  if (ColIndex.IsKey(Map)) {
5436  return ColIndex.GetDat(Map);
5437  }
5438  else {
5439  TIntV Empty;
5440  return Empty;
5441  }
5442  }
5443  TIntV ToReturn;
5444  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5445  TInt MapAtRow = RowI.GetStrMapByName(ColName);
5446  if ( Map == MapAtRow) {
5447  ToReturn.Add(RowI.GetRowIdx());
5448  }
5449  }
5450  return ToReturn;
5451 }
THash< TStr, THash< TInt, TIntV > > StrMapColIndexes
Indexes for String Columns.
Definition: table.h:569
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TStr TTable::GetStrVal ( TInt  ColIdx,
TInt  RowIdx 
) const
inlineprotected

Gets the value in column with id ColIdx at row RowIdx.

Definition at line 626 of file table.h.

626  {
627  return TStr(Context->StringVals.GetKey(StrColMaps[ColIdx][RowIdx]));
628  }
TTableContext * Context
Execution Context.
Definition: table.h:545
const char * GetKey(const int &KeyId) const
Definition: hash.h:893
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
Definition: dt.h:412
TStr TTable::GetStrVal ( const TStr ColName,
const TInt RowIdx 
) const
inline

Gets the value of string attribute ColName at row RowIdx.

Definition at line 1028 of file table.h.

1028  {
1029  return GetStrVal(GetColIdx(ColName), RowIdx);
1030  }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
TStr TTable::GetStrValById ( TInt  ColIdx,
TInt  RowIdx 
) const
inline

Gets the value of the string attribute at column ColIdx at row RowIdx.

Definition at line 1043 of file table.h.

1043  {
1044  return GetStrVal(ColIdx, RowIdx);
1045  }
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
TStr TTable::GetStrValByName ( const TStr ColName,
const TInt RowIdx 
) const
inline

Gets the value of the string attribute at column ColName at row RowIdx.

Definition at line 1048 of file table.h.

1048  {
1049  return GetStrVal(ColName, RowIdx);
1050  }
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
void TTable::Group ( const TStrV GroupBy,
const TStr GroupColName,
TBool  Ordered = true,
TBool  UsePhysicalIds = true 
)

Groups rows depending on values of GroupBy columns.

Specify columns to group by, name of column in new table, whether to treat columns as ordered If name of column is an empty string, no column is created

Definition at line 1569 of file table.cpp.

1569  {
1570  TStrV NGroupBy = NormalizeColNameV(GroupBy);
1571  TStr NGroupColName = NormalizeColName(GroupColName);
1572  TIntV UniqueVec;
1574  GroupAux(NGroupBy, Grouping, Ordered, NGroupColName, false, UniqueVec, UsePhysicalIds);
1575 }
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
Definition: hash.h:97
void TTable::GroupAux ( const TStrV GroupBy,
THash< TGroupKey, TPair< TInt, TIntV > > &  Grouping,
TBool  Ordered,
const TStr GroupColName,
TBool  KeepUnique,
TIntV UniqueVec,
TBool  UsePhysicalIds = true 
)
protected

Helper function for grouping.

If KeepUnique is true, UniqueVec will be modified to contain a row from each group If KeepUnique is false, then normal grouping is done and a new column is added depending on whether GroupColName is empty

Definition at line 1322 of file table.cpp.

1323  {
1324  TInt IdColIdx = GetColIdx(IdColName);
1325  if(!UsePhysicalIds && IdColIdx < 0){
1326  TExcept::Throw("Grouping: Either use physical row ids, or have an id column");
1327  }
1328  TIntV IntGroupByCols;
1329  TIntV FltGroupByCols;
1330  TIntV StrGroupByCols;
1331  // get indices for each column type
1332  for (TInt c = 0; c < GroupBy.Len(); c++) {
1333  //printf("GroupBy col %d: %s\n", c.Val, GroupBy[c].CStr());
1334  if (!IsColName(GroupBy[c])) {
1335  TExcept::Throw("no such column " + GroupBy[c]);
1336  }
1337 
1338  TPair<TAttrType, TInt> ColType = GetColTypeMap(GroupBy[c]);
1339  switch (ColType.Val1) {
1340  case atInt:
1341  IntGroupByCols.Add(ColType.Val2);
1342  break;
1343  case atFlt:
1344  FltGroupByCols.Add(ColType.Val2);
1345  break;
1346  case atStr:
1347  StrGroupByCols.Add(ColType.Val2);
1348  break;
1349  }
1350  }
1351 
1352  TInt IKLen = IntGroupByCols.Len();
1353  TInt FKLen = FltGroupByCols.Len();
1354  TInt SKLen = StrGroupByCols.Len();
1355 
1356  TInt GroupNum = 0;
1357  TVec<TPair<TInt, TInt> > GroupAndRowIds;
1358  //printf("done GroupAux initialization\n");
1359 
1360  // iterate over rows
1361  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
1362  TIntV IKey(IKLen + SKLen, 0);
1363  TFltV FKey(FKLen, 0);
1364  TIntV SKey(SKLen, 0);
1365 
1366  // find group key
1367  for (TInt c = 0; c < IKLen; c++) {
1368  IKey.Add(it.GetIntAttr(IntGroupByCols[c]));
1369  }
1370  for (TInt c = 0; c < FKLen; c++) {
1371  FKey.Add(it.GetFltAttr(FltGroupByCols[c]));
1372  }
1373  for (TInt c = 0; c < SKLen; c++) {
1374  SKey.Add(it.GetStrMapById(StrGroupByCols[c]));
1375  }
1376  if (!Ordered) {
1377  if (IKLen > 0) { IKey.ISort(0, IKey.Len()-1, true); }
1378  if (FKLen > 0) { FKey.ISort(0, FKey.Len()-1, true); }
1379  if (SKLen > 0) { SKey.ISort(0, SKey.Len()-1, true); }
1380  }
1381  for (TInt c = 0; c < SKLen; c++) {
1382  IKey.Add(SKey[c]);
1383  }
1384 
1385  // look for group matching the key
1386  TGroupKey GroupKey = TGroupKey(IKey, FKey);
1387 
1388  TInt RowIdx = it.GetRowIdx();
1389  TInt idx = UsePhysicalIds ? it.GetRowIdx() : IntCols[IdColIdx][it.GetRowIdx()];
1390  if (!Grouping.IsKey(GroupKey)) {
1391  // Grouping key hasn't been seen before, create a new group
1392  TPair<TInt, TIntV> NewGroup;
1393  NewGroup.Val1 = GroupNum;
1394  NewGroup.Val2.Add(idx);
1395  Grouping.AddDat(GroupKey, NewGroup);
1396  if (GroupColName != "") {
1397  GroupAndRowIds.Add(TPair<TInt, TInt>(GroupNum, RowIdx));
1398  }
1399  if (KeepUnique) {
1400  UniqueVec.Add(idx);
1401  }
1402  GroupNum++;
1403  } else {
1404  // Grouping key has been seen before, update corresponding group
1405  if (!KeepUnique) {
1406  TPair<TInt, TIntV>& NewGroup = Grouping.GetDat(GroupKey);
1407  NewGroup.Val2.Add(idx);
1408  if (GroupColName != "") {
1409  GroupAndRowIds.Add(TPair<TInt, TInt>(NewGroup.Val1, RowIdx));
1410  }
1411  }
1412  }
1413  }
1414  // printf("KeepUnique: %d\n", KeepUnique.Val);
1415  // update group mapping
1416  if (!KeepUnique) {
1417  GroupStmt Stmt(NormalizeColNameV(GroupBy), Ordered, UsePhysicalIds);
1418  GroupStmtNames.AddDat(GroupColName, Stmt);
1419  GroupIDMapping.AddKey(Stmt);
1420  GroupMapping.AddKey(Stmt);
1421  //printf("Adding statement: ");
1422  //Stmt.Print();
1423  for (THash<TGroupKey, TPair<TInt, TIntV> >::TIter it = Grouping.BegI(); it < Grouping.EndI(); it++) {
1424  TGroupKey key = it.GetKey();
1425  TPair<TInt, TIntV> group = it.GetDat();
1426  GroupIDMapping.GetDat(Stmt).AddDat(group.Val1, TGroupKey(key));
1427  GroupMapping.GetDat(Stmt).AddDat(TGroupKey(key), TIntV(group.Val2));
1428  }
1429  }
1430 
1431  // add a column to the table
1432  if (GroupColName != "") {
1433  StoreGroupCol(GroupColName, GroupAndRowIds);
1434  AddSchemaCol(GroupColName, atInt); // update schema
1435  }
1436 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
THash< GroupStmt, THash< TGroupKey, TIntV > > GroupMapping
Maps grouping statements to their (group-by key –> group id) mapping.
Definition: table.h:581
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
void StoreGroupCol(const TStr &GroupColName, const TVec< TPair< TInt, TInt > > &GroupAndRowIds)
Parallel helper function for grouping. - we currently don't support such parallel grouping by complex...
Definition: table.cpp:1310
TIter BegI() const
Definition: hash.h:213
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TIter EndI() const
Definition: hash.h:218
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TPair< TIntV, TFltV > TGroupKey
Represents grouping key with IntV for integer and string attributes and FltV for float attributes...
Definition: table.h:145
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
A class representing a cached grouping statement identifier.
Definition: table.h:266
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
THash< TStr, GroupStmt > GroupStmtNames
Maps user-given grouping statement names to their group-by attributes.
Definition: table.h:573
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
THash< GroupStmt, THash< TInt, TGroupKey > > GroupIDMapping
Maps grouping statements to their (group id –> group-by key) mapping.
Definition: table.h:577
Definition: hash.h:97
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVec< TInt > TIntV
Definition: ds.h:1594
TVal2 Val2
Definition: ds.h:35
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
template<class T >
void TTable::GroupByFltCol ( const TStr GroupBy,
T &  Grouping,
const TIntV IndexSet,
TBool  All,
TBool  UsePhysicalIds = true 
) const
protected

Groups/hashes by a single column with float values. Returns hash table with grouping.

Definition at line 1626 of file table.h.

1627  {
1628  TInt IdColIdx = GetColIdx(IdColName);
1629  if(!UsePhysicalIds && IdColIdx < 0){
1630  TExcept::Throw("Grouping: Either use physical row ids, or have an id column");
1631  }
1632  GroupingSanityCheck(GroupBy, atFlt);
1633  if (All) {
1634  // Optimize for the common and most expensive case - iterate over only valid rows.
1635  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
1636  TInt idx = UsePhysicalIds ? it.GetRowIdx() : it.GetIntAttr(IdColIdx);
1637  UpdateGrouping<TFlt>(Grouping, it.GetFltAttr(GroupBy), idx);
1638  }
1639  } else {
1640  // Consider only rows in IndexSet.
1641  for (TInt i = 0; i < IndexSet.Len(); i++) {
1642  if (IsRowValid(IndexSet[i])) {
1643  TInt RowIdx = IndexSet[i];
1644  const TFltV& Col = FltCols[GetColIdx(GroupBy)];
1645  TInt idx = UsePhysicalIds ? RowIdx : IntCols[IdColIdx][RowIdx];
1646  UpdateGrouping<TFlt>(Grouping, Col[RowIdx], idx);
1647  }
1648  }
1649  }
1650 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
void GroupingSanityCheck(const TStr &GroupBy, const TAttrType &AttrType) const
Checks if grouping key exists and matches given attr type.
Definition: table.cpp:1215
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
bool IsRowValid(TInt RowIdx) const
Checks if RowIdx corresponds to a valid (i.e. not deleted) row.
Definition: table.h:801
template<class T >
void TTable::GroupByIntCol ( const TStr GroupBy,
T &  Grouping,
const TIntV IndexSet,
TBool  All,
TBool  UsePhysicalIds = true 
) const
protected

Groups/hashes by a single column with integer values.

Group/hash by a single column with integer values. Returns hash table with grouping. IndexSet tells what rows to consider (vector of physical row ids). It is used only if All == true. Note that the IndexSet option is currently not used anywhere.

Definition at line 1598 of file table.h.

1599  {
1600  TInt IdColIdx = GetColIdx(IdColName);
1601  if(!UsePhysicalIds && IdColIdx < 0){
1602  TExcept::Throw("Grouping: Either use physical row ids, or have an id column");
1603  }
1604  // TO do: add a check if grouping already exists and is valid
1605  GroupingSanityCheck(GroupBy, atInt);
1606  if (All) {
1607  // Optimize for the common and most expensive case - iterate over only valid rows.
1608  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
1609  TInt idx = UsePhysicalIds ? it.GetRowIdx() : it.GetIntAttr(IdColIdx);
1610  UpdateGrouping<TInt>(Grouping, it.GetIntAttr(GroupBy), idx);
1611  }
1612  } else {
1613  // Consider only rows in IndexSet.
1614  for (TInt i = 0; i < IndexSet.Len(); i++) {
1615  if (IsRowValid(IndexSet[i])) {
1616  TInt RowIdx = IndexSet[i];
1617  const TIntV& Col = IntCols[GetColIdx(GroupBy)];
1618  TInt idx = UsePhysicalIds ? RowIdx : IntCols[IdColIdx][RowIdx];
1619  UpdateGrouping<TInt>(Grouping, Col[RowIdx], idx);
1620  }
1621  }
1622  }
1623 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
void GroupingSanityCheck(const TStr &GroupBy, const TAttrType &AttrType) const
Checks if grouping key exists and matches given attr type.
Definition: table.cpp:1215
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
bool IsRowValid(TInt RowIdx) const
Checks if RowIdx corresponds to a valid (i.e. not deleted) row.
Definition: table.h:801
void TTable::GroupByIntColMP ( const TStr GroupBy,
THashMP< TInt, TIntV > &  Grouping,
TBool  UsePhysicalIds = true 
) const

Groups/hashes by a single column with integer values, using OpenMP multi-threading.

Definition at line 1225 of file table.cpp.

1225  {
1226  timeval timer0;
1227  gettimeofday(&timer0, NULL);
1228  double t1 = timer0.tv_sec + (timer0.tv_usec/1000000.0);
1229  //printf("X\n");
1230  TInt IdColIdx = GetColIdx(IdColName);
1231  TInt GroupByColIdx = GetColIdx(GroupBy);
1232  if(!UsePhysicalIds && IdColIdx < 0){
1233  TExcept::Throw("Grouping: Either use physical row ids, or have an id column");
1234  }
1235  //double startFn = omp_get_wtime();
1236  GroupingSanityCheck(GroupBy, atInt);
1237  TIntPrV Partitions;
1238  GetPartitionRanges(Partitions, 8*CHUNKS_PER_THREAD);
1239  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
1240  //double endPart = omp_get_wtime();
1241  //printf("Partition time = %f\n", endPart-startFn);
1242 
1243  Grouping.Gen(NumValidRows);
1244  //double endGen = omp_get_wtime();
1245  //printf("Gen time = %f\n", endGen-endPart);
1246  //printf("S\n");
1247  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD) //num_threads(1)
1248  for (int i = 0; i < Partitions.Len(); i++){
1249  TRowIterator RowI(Partitions[i].GetVal1(), this);
1250  TRowIterator EndI(Partitions[i].GetVal2(), this);
1251  while (RowI < EndI) {
1252  TInt idx = UsePhysicalIds ? RowI.GetRowIdx() : RowI.GetIntAttr(IdColIdx);
1253  // printf("updating grouping with key = %d, row_id = %d\n", RowI.GetIntAttr(GroupBy).Val, idx.Val);
1254  UpdateGrouping<TInt>(Grouping, RowI.GetIntAttr(GroupByColIdx), idx);
1255  RowI++;
1256  }
1257  }
1258  gettimeofday(&timer0, NULL);
1259  double t2 = timer0.tv_sec + (timer0.tv_usec/1000000.0);
1260  printf("Grouping time: %f\n", t2 - t1);
1261  //double endAdd = omp_get_wtime();
1262  //printf("Add time = %f\n", endAdd-endGen);
1263 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
void GroupingSanityCheck(const TStr &GroupBy, const TAttrType &AttrType) const
Checks if grouping key exists and matches given attr type.
Definition: table.cpp:1215
Definition: dt.h:1134
void Gen(const int &ExpectVals)
Definition: hashmp.h:160
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
template<class T >
void TTable::GroupByStrCol ( const TStr GroupBy,
T &  Grouping,
const TIntV IndexSet,
TBool  All,
TBool  UsePhysicalIds = true 
) const
protected

Groups/hashes by a single column with string values. Returns hash table with grouping.

Definition at line 1653 of file table.h.

1654  {
1655  TInt IdColIdx = GetColIdx(IdColName);
1656  if(!UsePhysicalIds && IdColIdx < 0){
1657  TExcept::Throw("Grouping: Either use physical row ids, or have an id column");
1658  }
1659  GroupingSanityCheck(GroupBy, atStr);
1660  if (All) {
1661  // Optimize for the common and most expensive case - iterate over all valid rows.
1662  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
1663  TInt idx = UsePhysicalIds ? it.GetRowIdx() : it.GetIntAttr(IdColIdx);
1664  UpdateGrouping<TInt>(Grouping, it.GetStrMapByName(GroupBy), idx);
1665  }
1666  } else {
1667  // Consider only rows in IndexSet.
1668  for (TInt i = 0; i < IndexSet.Len(); i++) {
1669  if (IsRowValid(IndexSet[i])) {
1670  TInt RowIdx = IndexSet[i];
1671  TInt ColIdx = GetColIdx(GroupBy);
1672  TInt idx = UsePhysicalIds ? RowIdx : IntCols[IdColIdx][RowIdx];
1673  UpdateGrouping<TInt>(Grouping, StrColMaps[ColIdx][RowIdx], idx);
1674  }
1675  }
1676  }
1677 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
void GroupingSanityCheck(const TStr &GroupBy, const TAttrType &AttrType) const
Checks if grouping key exists and matches given attr type.
Definition: table.cpp:1215
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
bool IsRowValid(TInt RowIdx) const
Checks if RowIdx corresponds to a valid (i.e. not deleted) row.
Definition: table.h:801
void TTable::GroupingSanityCheck ( const TStr GroupBy,
const TAttrType AttrType 
) const
protected

Checks if grouping key exists and matches given attr type.

Definition at line 1215 of file table.cpp.

1215  {
1216  if (!IsColName(GroupBy)) {
1217  TExcept::Throw("no such column " + GroupBy);
1218  }
1219  if (GetColType(GroupBy) != AttrType) {
1220  TExcept::Throw(GroupBy + " values are not of expected type");
1221  }
1222 }
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::IncrementNext ( )
protected

Increments the next vector and set last, NumRows and NumValidRows.

Definition at line 2255 of file table.cpp.

2256 {
2257  // Advance the Next vector
2258  NumRows++;
2259  NumValidRows++;
2260  if (!Next.Empty()) {
2261  Next[Next.Len()-1] = NumValidRows-1;
2263  }
2264  Next.Add(Last);
2265 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
bool Empty() const
Tests whether the vector is empty.
Definition: ds.h:570
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
PTable TTable::InitializeJointTable ( const TTable Table)
protected

Initializes an empty table for the join of this table with the given table.

Definition at line 1916 of file table.cpp.

1916  {
1917  PTable JointTable = New(Context);
1918  JointTable->IntCols = TVec<TIntV>(IntCols.Len() + Table.IntCols.Len() + 1);
1919  JointTable->FltCols = TVec<TFltV>(FltCols.Len() + Table.FltCols.Len());
1920  JointTable->StrColMaps = TVec<TIntV>(StrColMaps.Len() + Table.StrColMaps.Len());
1921  for (TInt i = 0; i < Sch.Len(); i++) {
1922  TStr ColName = GetSchemaColName(i);
1923  TAttrType ColType = GetSchemaColType(i);
1924  TStr CName = JointTable->RenumberColName(ColName);
1925  TPair<TAttrType, TInt> TypeMap = GetColTypeMap(ColName);
1926  JointTable->AddColType(CName, TypeMap);
1927  //JointTable->AddLabel(CName, ColName);
1928  JointTable->AddSchemaCol(CName, ColType);
1929  }
1930  for (TInt i = 0; i < Table.Sch.Len(); i++) {
1931  TStr ColName = Table.GetSchemaColName(i);
1932  TAttrType ColType = Table.GetSchemaColType(i);
1933  TStr CName = JointTable->RenumberColName(ColName);
1934  TPair<TAttrType, TInt> NewDat = Table.GetColTypeMap(ColName);
1935  Assert(ColType == NewDat.Val1);
1936  // add offsets
1937  switch (NewDat.Val1) {
1938  case atInt:
1939  NewDat.Val2 += IntCols.Len();
1940  break;
1941  case atFlt:
1942  NewDat.Val2 += FltCols.Len();
1943  break;
1944  case atStr:
1945  NewDat.Val2 += StrColMaps.Len();
1946  break;
1947  }
1948  JointTable->AddColType(CName, NewDat);
1949  JointTable->AddSchemaCol(CName, ColType);
1950  }
1951  TStr IdColName = "_id";
1952  JointTable->AddColType(IdColName, atInt, IntCols.Len() + Table.IntCols.Len());
1953  JointTable->AddSchemaCol(IdColName, atInt);
1954  return JointTable;
1955 }
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TTableContext * Context
Execution Context.
Definition: table.h:545
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
TAttrType GetSchemaColType(TInt Idx) const
Gets type of the column with index Idx in the schema.
Definition: table.h:640
#define Assert(Cond)
Definition: bd.h:251
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
Definition: dt.h:412
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: bd.h:196
Definition: gbase.h:23
static PTable New()
Definition: table.h:932
void TTable::InitIds ( )

Adds explicit row ids, initialize hash set mapping ids to physical rows.

Definition at line 1883 of file table.cpp.

1883  {
1884  IdColName = "_id";
1885  //Assert(NumRows == NumValidRows);
1887 }
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
void AddIdColumn(const TStr &IdColName)
Adds a column of explicit integer identifiers to the rows.
Definition: table.cpp:1900
void TTable::InitRowIdBuckets ( int  NumBuckets)
protected

Initializes the RowIdBuckets vector which will be used for the graph sequence creation.

Definition at line 3535 of file table.cpp.

3535  {
3536  for (TInt i = 0; i < RowIdBuckets.Len(); i++) {
3537  RowIdBuckets[i].Clr();
3538  }
3539  RowIdBuckets.Clr();
3540 
3541  RowIdBuckets.Gen(NumBuckets);
3542  for (TInt i = 0; i < NumBuckets; i++) {
3543  RowIdBuckets[i].Gen(10, 0);
3544  }
3545 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
void Clr(const bool &DoDel=true, const TSizeTy &NoDelLim=-1)
Clears the contents of the vector.
Definition: ds.h:1022
Definition: dt.h:1134
void Gen(const TSizeTy &_Vals)
Constructs a vector (an array) of _Vals elements.
Definition: ds.h:523
PTable TTable::Intersection ( const TTable Table)

Returns intersection of this table with given Table.

Definition at line 4567 of file table.cpp.

4567  {
4568  Schema NewSchema;
4569  THashSet<TInt> Collisions;
4570 
4571  for (TInt c = 0; c < Sch.Len(); c++) {
4572  if (Sch[c].Val1 != GetIdColName()) {
4573  NewSchema.Add(TPair<TStr,TAttrType>(Sch[c].Val1, Sch[c].Val2));
4574  }
4575  }
4576  PTable result = TTable::New(NewSchema, Context);
4577 
4578  GetCollidingRows(Table, Collisions);
4579 
4580  // this part should be made faster by adding all the rows in one go
4581  for (TRowIterator it = Table.BegRI(); it < Table.EndRI(); it++) {
4582  if (Collisions.IsKey(it.GetRowIdx())) {
4583  result->AddRow(it);
4584  }
4585  }
4586  result->InitIds();
4587  return result;
4588 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
Iterator class for TTable rows.
Definition: table.h:330
void GetCollidingRows(const TTable &T, THashSet< TInt > &Collisions)
Gets set of row ids of rows common with table T.
Definition: table.cpp:4014
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
PTable TTable::Intersection ( const PTable Table)
inline

Definition at line 1422 of file table.h.

1422 { return Intersection(*Table); };
PTable Intersection(const TTable &Table)
Returns intersection of this table with given Table.
Definition: table.cpp:4567
void TTable::InvalidateAffectedGroupings ( const TStr Attr)
protected

Definition at line 1581 of file table.cpp.

1581  {
1582  //TODO
1583 }
void TTable::InvalidatePhysicalGroupings ( )
protected

A mapping between grouping statement (group-by attribute names and 'Ordered' flag) to a hash map between group-by keys to the ids of records that share the group-by key. Can be used as hash-index for the table.

Definition at line 1577 of file table.cpp.

1577  {
1578  //TODO
1579 }
TBool TTable::IsAttr ( const TStr Attr)
protected

Checks if Attr is an attribute of this table schema.

Definition at line 4628 of file table.cpp.

4628  {
4629  return IsColName(Attr);
4630 }
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
TBool TTable::IsColName ( const TStr ColName) const
inlineprotected

Definition at line 646 of file table.h.

646  {
647  TStr NColName = NormalizeColName(ColName);
648  return ColTypeMap.IsKey(NColName);
649  }
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TBool TTable::IsLastGraphOfSequence ( )

Checks if the end of the graph sequence is reached.

Definition at line 3685 of file table.cpp.

3685  {
3686  return CurrBucket >= RowIdBuckets.Len() - 1;
3687 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > RowIdBuckets
Partitioning of row ids into buckets corresponding to different graph objects when generating a seque...
Definition: table.h:599
TInt CurrBucket
Current row id bucket - used when generating a sequence of graphs using an iterator.
Definition: table.h:600
PTable TTable::IsNextK ( const TStr OrderCol,
TInt  K,
const TStr GroupBy,
const TStr RankColName = "" 
)

Distance based filter.

Creates a table T' where the rows are joint rows (T[r1],T[r2]) such that r2 is one of the successive rows to r1 when this table is ordered by OrderCol, and both r1 and r2 have the same value of GroupBy column

Definition at line 3891 of file table.cpp.

3891  {
3892  TStrV OrderBy;
3893  if (GroupBy.Empty()) {
3894  OrderBy.Add(OrderCol);
3895  } else {
3896  OrderBy.Add(GroupBy);
3897  OrderBy.Add(OrderCol);
3898  }
3899  if (RankColName.Empty()) {
3900  Order(OrderBy);
3901  } else {
3902  Order(OrderBy, RankColName, true);
3903  }
3904  TAttrType GroupByAttrType = GetColType(GroupBy);
3905  PTable T = InitializeJointTable(*this);
3906  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
3907  TInt Succ = RI.GetRowIdx();
3908  TBool OutOfGroup = false;
3909  for (TInt i = 0; i < K; i++) {
3910  Succ = Next[Succ];
3911  if (Succ == Last) { break; }
3912  switch (GroupByAttrType) {
3913  case atInt:
3914  if (GetIntVal(GroupBy, Succ) != RI.GetIntAttr(GroupBy)) { OutOfGroup = true; }
3915  break;
3916  case atFlt:
3917  if (GetFltVal(GroupBy, Succ) != RI.GetFltAttr(GroupBy)) { OutOfGroup = true; }
3918  break;
3919  case atStr:
3920  if (GetStrVal(GroupBy, Succ) != RI.GetStrAttr(GroupBy)) { OutOfGroup = true; }
3921  break;
3922  }
3923  if (OutOfGroup) { break; } // break out of inner for loop
3924  T->AddJointRow(*this, *this, RI.GetRowIdx(), Succ);
3925  }
3926  }
3927  return T;
3928 }
void Order(const TStrV &OrderBy, TStr OrderColName="", TBool ResetRankByMSC=false, TBool Asc=true)
Orders the rows according to the values in columns of OrderBy (in descending lexicographic order)...
Definition: table.cpp:3240
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TFlt GetFltVal(const TStr &ColName, const TInt &RowIdx)
Gets the value of float attribute ColName at row RowIdx.
Definition: table.h:1024
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
bool Empty() const
Definition: dt.h:488
Definition: gbase.h:23
Definition: bd.h:196
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: dt.h:971
TInt GetIntVal(const TStr &ColName, const TInt &RowIdx)
Gets the value of integer attribute ColName at row RowIdx.
Definition: table.h:1020
void TTable::ISort ( TIntV V,
TInt  StartIdx,
TInt  EndIdx,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc = true 
)
protected

Performs insertion sort on given vector V.

Definition at line 3096 of file table.cpp.

3096  {
3097  if (StartIdx < EndIdx) {
3098  for (TInt i = StartIdx+1; i <= EndIdx; i++) {
3099  TInt Val = V[i];
3100  TInt j = i;
3101  while ((StartIdx < j) && (CompareRows(V[j-1], Val, SortByTypes, SortByIndices, Asc) > 0)) {
3102  V[j] = V[j-1];
3103  j--;
3104  }
3105  V[j] = Val;
3106  }
3107  }
3108 }
Definition: dt.h:1134
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
void TTable::ISortKeyVal ( TIntV Key,
TIntV Val,
TInt  Start,
TInt  End 
)
staticprotected

Definition at line 5321 of file table.cpp.

5321  {
5322  if (Start < End) {
5323  for (TInt i = Start+1; i <= End; i++) {
5324  TInt K = Key[i];
5325  TInt V = Val[i];
5326  TInt j = i;
5327  while ((Start < j) && (CompareKeyVal(Key[j-1], Val[j-1], K, V) > 0)) {
5328  Key[j] = Key[j-1];
5329  Val[j] = Val[j-1];
5330  j--;
5331  }
5332  Key[j] = K;
5333  Val[j] = V;
5334  }
5335  }
5336 }
static TInt CompareKeyVal(const TInt &K1, const TInt &V1, const TInt &K2, const TInt &V2)
Definition: table.cpp:5297
Definition: dt.h:1134
bool TTable::IsRowValid ( TInt  RowIdx) const
inlineprotected

Checks if RowIdx corresponds to a valid (i.e. not deleted) row.

Definition at line 801 of file table.h.

801 { return Next[RowIdx] != Invalid;}
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
PTable TTable::Join ( const TStr Col1,
const TTable Table,
const TStr Col2 
)

Performs equijoin.

Perform equi-join with given columns - i.e. keep tuple pairs where this->Col1 == Table->Col2 Implementation: Hash-Join - build a hash out of the smaller table hash the larger table and check for collisions

Definition at line 2272 of file table.cpp.

2272  {
2273  // double startFn = omp_get_wtime();
2274  if (!IsColName(Col1)) {
2275  TExcept::Throw("no such column " + Col1);
2276  printf("no such column %s\n", Col1.CStr());
2277  }
2278  if (!Table.IsColName(Col2)) {
2279  TExcept::Throw("no such column " + Col2);
2280  printf("no such column %s\n", Col2.CStr());
2281  }
2282  if (GetColType(Col1) != Table.GetColType(Col2)) {
2283  TExcept::Throw("Trying to Join on columns of different type");
2284  printf("Trying to Join on columns of different type\n");
2285  }
2286  //printf("passed initial checks\n");
2287  // initialize result table
2288  PTable JointTable = InitializeJointTable(Table);
2289  //printf("initialized joint table\n");
2290  // hash smaller table (group by column)
2291  TAttrType ColType = GetColType(Col1);
2292  TBool ThisIsSmaller = (NumValidRows <= Table.NumValidRows);
2293  const TTable& TS = ThisIsSmaller ? *this : Table;
2294  const TTable& TB = ThisIsSmaller ? Table : *this;
2295  TStr ColS = ThisIsSmaller ? Col1 : Col2;
2296  TStr ColB = ThisIsSmaller ? Col2 : Col1;
2297  TInt ColBId = ThisIsSmaller ? Table.GetColIdx(ColB) : GetColIdx(ColB);
2298  // double endInit = omp_get_wtime();
2299  // printf("Init time = %f\n", endInit-startFn);
2300  // iterate over the rows of the bigger table and check for "collisions"
2301  // with the group keys for the small table.
2302 #ifdef GCC_ATOMIC
2303  if (GetMP()) {
2304  switch(ColType){
2305  case atInt:{
2307  TS.GroupByIntColMP(ColS, T, true);
2308  // double endGroup = omp_get_wtime();
2309  // printf("Group time = %f\n", endGroup-endInit);
2310 
2311  TIntPrV Partitions;
2312  TB.GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
2313  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
2314  TVec<TIntPrV> JointRowIDSet(Partitions.Len());
2315  // double endPart = omp_get_wtime();
2316  // printf("Partition time = %f\n", endPart-endGroup);
2317 
2318  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
2319  for (int i = 0; i < Partitions.Len(); i++){
2320  //double start = omp_get_wtime();
2321  JointRowIDSet[i].Reserve(PartitionSize);
2322  TRowIterator RowI(Partitions[i].GetVal1(), &TB);
2323  TRowIterator EndI(Partitions[i].GetVal2(), &TB);
2324  while (RowI < EndI) {
2325  TInt K = RowI.GetIntAttr(ColBId);
2326  if(T.IsKey(K)){
2327  TIntV& Group = T.GetDat(K);
2328  for(TInt j = 0; j < Group.Len(); j++){
2329  if(ThisIsSmaller){
2330  JointRowIDSet[i].Add(TIntPr(Group[j], RowI.GetRowIdx()));
2331  } else{
2332  JointRowIDSet[i].Add(TIntPr(RowI.GetRowIdx(), Group[j]));
2333  }
2334  }
2335  }
2336  RowI++;
2337  }
2338  //double end = omp_get_wtime();
2339  //printf("END: Thread %d: i = %d, start = %d, end = %d, num = %d, time = %f\n", omp_get_thread_num(), i,
2340  // Partitions[i].GetVal1().Val, Partitions[i].GetVal2().Val, JointRowIDSet[i].Len(), end-start);
2341  }
2342  // double endJoin = omp_get_wtime();
2343  // printf("Iterate time = %f\n", endJoin-endPart);
2344  JointTable->AddNJointRowsMP(*this, Table, JointRowIDSet);
2345  // double endAdd = omp_get_wtime();
2346  // printf("Add time = %f\n", endAdd-endJoin);
2347  break;
2348  }
2349  case atFlt:{
2351  TS.GroupByFltCol(ColS, T, TIntV(), true);
2352 
2353  TIntPrV Partitions;
2354  TB.GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
2355  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
2356  TVec<TIntPrV> JointRowIDSet(Partitions.Len());
2357 
2358  #pragma omp parallel for schedule(dynamic)
2359  for (int i = 0; i < Partitions.Len(); i++){
2360  JointRowIDSet[i].Reserve(PartitionSize);
2361  TRowIterator RowI(Partitions[i].GetVal1(), &TB);
2362  TRowIterator EndI(Partitions[i].GetVal2(), &TB);
2363  while (RowI < EndI) {
2364  TFlt K = RowI.GetFltAttr(ColBId);
2365  if(T.IsKey(K)){
2366  TIntV& Group = T.GetDat(K);
2367  for(TInt j = 0; j < Group.Len(); j++){
2368  if(ThisIsSmaller){
2369  JointRowIDSet[i].Add(TIntPr(Group[j], RowI.GetRowIdx()));
2370  } else{
2371  JointRowIDSet[i].Add(TIntPr(RowI.GetRowIdx(), Group[j]));
2372  }
2373  }
2374  }
2375  RowI++;
2376  }
2377  }
2378  JointTable->AddNJointRowsMP(*this, Table, JointRowIDSet);
2379  break;
2380  }
2381  case atStr:{
2383  TS.GroupByStrCol(ColS, T, TIntV(), true);
2384 
2385  TIntPrV Partitions;
2386  TB.GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
2387  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
2388  TVec<TIntPrV> JointRowIDSet(Partitions.Len());
2389 
2390  #pragma omp parallel for schedule(dynamic)
2391  for (int i = 0; i < Partitions.Len(); i++){
2392  JointRowIDSet[i].Reserve(PartitionSize);
2393  TRowIterator RowI(Partitions[i].GetVal1(), &TB);
2394  TRowIterator EndI(Partitions[i].GetVal2(), &TB);
2395  while (RowI < EndI) {
2396  TInt K = RowI.GetStrMapById(ColBId);
2397  if(T.IsKey(K)){
2398  TIntV& Group = T.GetDat(K);
2399  for(TInt j = 0; j < Group.Len(); j++){
2400  if(ThisIsSmaller){
2401  JointRowIDSet[i].Add(TIntPr(Group[j], RowI.GetRowIdx()));
2402  } else{
2403  JointRowIDSet[i].Add(TIntPr(RowI.GetRowIdx(), Group[j]));
2404  }
2405  }
2406  }
2407  RowI++;
2408  }
2409  }
2410  JointTable->AddNJointRowsMP(*this, Table, JointRowIDSet);
2411  }
2412  break;
2413  }
2414  } else {
2415 #endif // GCC_ATOMIC
2416  switch (ColType) {
2417  case atInt:{
2418  TIntIntVH T;
2419  TS.GroupByIntCol(ColS, T, TIntV(), true);
2420  for (TRowIterator RowI = TB.BegRI(); RowI < TB.EndRI(); RowI++) {
2421  TInt K = RowI.GetIntAttr(ColBId);
2422  if (T.IsKey(K)) {
2423  TIntV& Group = T.GetDat(K);
2424  for (TInt i = 0; i < Group.Len(); i++) {
2425  if (ThisIsSmaller) {
2426  JointTable->AddJointRow(*this, Table, Group[i], RowI.GetRowIdx());
2427  } else {
2428  JointTable->AddJointRow(*this, Table, RowI.GetRowIdx(), Group[i]);
2429  }
2430  }
2431  }
2432  }
2433  break;
2434  }
2435  case atFlt:{
2437  TS.GroupByFltCol(ColS, T, TIntV(), true);
2438  for (TRowIterator RowI = TB.BegRI(); RowI < TB.EndRI(); RowI++) {
2439  TFlt K = RowI.GetFltAttr(ColBId);
2440  if (T.IsKey(K)) {
2441  TIntV& Group = T.GetDat(K);
2442  for (TInt i = 0; i < Group.Len(); i++) {
2443  if (ThisIsSmaller) {
2444  JointTable->AddJointRow(*this, Table, Group[i], RowI.GetRowIdx());
2445  } else {
2446  JointTable->AddJointRow(*this, Table, RowI.GetRowIdx(), Group[i]);
2447  }
2448  }
2449  }
2450  }
2451  break;
2452  }
2453  case atStr:{
2454  TIntIntVH T;
2455  TS.GroupByStrCol(ColS, T, TIntV(), true);
2456  for (TRowIterator RowI = TB.BegRI(); RowI < TB.EndRI(); RowI++) {
2457  TInt K = RowI.GetStrMapById(ColBId);
2458  if (T.IsKey(K)) {
2459  TIntV& Group = T.GetDat(K);
2460  for (TInt i = 0; i < Group.Len(); i++) {
2461  if (ThisIsSmaller) {
2462  JointTable->AddJointRow(*this, Table, Group[i], RowI.GetRowIdx());
2463  } else {
2464  JointTable->AddJointRow(*this, Table, RowI.GetRowIdx(), Group[i]);
2465  }
2466  }
2467  }
2468  }
2469  }
2470  break;
2471  }
2472 #ifdef GCC_ATOMIC
2473  }
2474 #endif
2475  return JointTable;
2476 }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
void GroupByIntColMP(const TStr &GroupBy, THashMP< TInt, TIntV > &Grouping, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values, using OpenMP multi-threading.
Definition: table.cpp:1225
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void Group(const TStrV &GroupBy, const TStr &GroupColName, TBool Ordered=true, TBool UsePhysicalIds=true)
Groups rows depending on values of GroupBy columns.
Definition: table.cpp:1569
static TInt GetMP()
Definition: table.h:527
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TInt GetNumValidRows() const
Gets number of valid, i.e. not deleted, rows in this table.
Definition: table.h:1234
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
void GroupByFltCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with float values. Returns hash table with grouping.
Definition: table.h:1626
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
const TVal & GetDat(const TVal &Val) const
Returns reference to the first occurrence of element Val.
Definition: ds.h:838
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void GroupByIntCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values.
Definition: table.h:1598
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
void GroupByStrCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with string values. Returns hash table with grouping.
Definition: table.h:1653
Definition: gbase.h:23
Table class: Relational table with columnar data storage.
Definition: table.h:484
Hash-Table with multiprocessing support.
Definition: hashmp.h:81
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: bd.h:196
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
void Reserve(const TSizeTy &_MxVals)
Reserves enough memory for the vector to store _MxVals elements.
Definition: ds.h:543
Definition: gbase.h:23
char * CStr()
Definition: dt.h:476
bool IsKey(const TKey &Key) const
Definition: hash.h:258
Definition: dt.h:971
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
PTable TTable::Join ( const TStr Col1,
const PTable Table,
const TStr Col2 
)
inline

Definition at line 1360 of file table.h.

1360  {
1361  return Join(Col1, *Table, Col2);
1362  }
PTable Join(const TStr &Col1, const TTable &Table, const TStr &Col2)
Performs equijoin.
Definition: table.cpp:2272
void TTable::KeepSortedRows ( const TIntV KeepV)
protected

Removes all rows that are not mentioned in the SORTED vector KeepV.

Definition at line 1152 of file table.cpp.

1152  {
1153  TIntIntH KeepH(KeepV.Len());
1154  for (TInt i = 0; i < KeepV.Len(); i++) {
1155  KeepH.AddKey(KeepV[i]);
1156  }
1157 
1159  TInt KeepSize = 0;
1160  while (RowI.GetNextRowIdx() != Last) {
1161  if (KeepSize < KeepV.Len()) {
1162  if (KeepH.IsKey(RowI.GetNextRowIdx())) {
1163  KeepSize++;
1164  RowI++;
1165  } else {
1166  RowI.RemoveNext();
1167  }
1168  } else {
1169  // Covered all of KeepV. Remove the rest of the rows.
1170  // Current RowI.CurrRowIdx is the last element of KeepV.
1171  RowI.RemoveNext();
1172  }
1173  }
1174  LastValidRow = KeepV[KeepV.Len()-1];
1175 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TRowIteratorWithRemove BegRIWR()
Gets iterator with reomve to the first valid row.
Definition: table.h:1245
Iterator class for TTable rows, that allows logical row removal while iterating.
Definition: table.h:374
TInt GetNextRowIdx() const
Gets physical index of next row.
Definition: table.cpp:243
void RemoveNext()
Removes next row.
Definition: table.cpp:278
Definition: dt.h:1134
int AddKey(const TKey &Key)
Definition: hash.h:373
Definition: hash.h:97
static PTable TTable::Load ( TSIn SIn,
TTableContext Context 
)
inlinestatic

Loads table from a binary format.

TTableContext Context must be provided as a parameter and loaded separately from a table load as it can be shared among multiple tables. Context can be loaded either before and after the table load, but must be available for operations that require string values (as opposed to string references).

Definition at line 971 of file table.h.

971 { return new TTable(SIn, Context);}
TTable()
Definition: table.cpp:302
static PTable TTable::LoadShM ( TShMIn ShMIn,
TTableContext Context 
)
inlinestatic

Static constructor to load table from memory.

Cannot perform operations that edit the edge vectors of nodes or perform illegal operations on any internal hashes (deletion or swapping keys)

Definition at line 975 of file table.h.

975  {
976  TTable* Table = new TTable();
977  Table->LoadTableShM(ShMIn, Context);
978  return PTable(Table);
979  }
void LoadTableShM(TShMIn &ShMIn, TTableContext *ContextTable)
Definition: table.cpp:360
Table class: Relational table with columnar data storage.
Definition: table.h:484
TTable()
Definition: table.cpp:302
TPt< TTable > PTable
Definition: table.h:141
PTable TTable::LoadSS ( const Schema S,
const TStr InFNm,
TTableContext Context,
const char &  Separator = '\t',
TBool  HasTitleLine = false 
)
static

Loads table from spread sheet (TSV, CSV, etc). Note: HasTitleLine = true is not supported. Please comment title lines instead.

Definition at line 795 of file table.cpp.

796  {
797  return LoadSS(S, InFNm, Context, TIntV(), Separator, HasTitleLine);
798 }
static PTable LoadSS(const Schema &S, const TStr &InFNm, TTableContext *Context, const char &Separator= '\t', TBool HasTitleLine=false)
Loads table from spread sheet (TSV, CSV, etc). Note: HasTitleLine = true is not supported. Please comment title lines instead.
Definition: table.cpp:795
TVec< TInt > TIntV
Definition: ds.h:1594
PTable TTable::LoadSS ( const Schema S,
const TStr InFNm,
TTableContext Context,
const TIntV RelevantCols,
const char &  Separator = '\t',
TBool  HasTitleLine = false 
)
static

Loads table from spread sheet - but only load the columns specified by RelevantCols. Note: HasTitleLine = true is not supported. Please comment title lines instead.

Definition at line 757 of file table.cpp.

758  {
759  TVec<uint64> IntGroupByCols;
760  bool NoStringCols = true;
761 
762  // find the schema for the new table which contains only relevant columns
763  Schema SR;
764  if (RelevantCols.Len() == 0) {
765  SR = S;
766  } else {
767  for (int i = 0; i < RelevantCols.Len(); i++) {
768  SR.Add(S[RelevantCols[i]]);
769  }
770  }
771  PTable T = New(SR, Context);
772 
773  // find col types and check for string cols
774  for (int i = 0; i < SR.Len(); i++) {
775  if (T->GetSchemaColType(i) == atStr) {
776  NoStringCols = false;
777  break;
778  }
779  }
780 
781  if (GetMP() && NoStringCols) {
782  // Right now, can load in parallel only in Linux (for mmap) and if
783  // there are no string columns
784 #ifdef GLib_LINUX
785  LoadSSPar(T, S, InFNm, RelevantCols, Separator, HasTitleLine);
786 #else
787  LoadSSSeq(T, S, InFNm, RelevantCols, Separator, HasTitleLine);
788 #endif
789  } else {
790  LoadSSSeq(T, S, InFNm, RelevantCols, Separator, HasTitleLine);
791  }
792  return T;
793 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static TInt GetMP()
Definition: table.h:527
static void LoadSSSeq(PTable &NewTable, const Schema &S, const TStr &InFNm, const TIntV &RelevantCols, const char &Separator, TBool HasTitleLine)
Sequentially loads data from input file at InFNm into NewTable.
Definition: table.cpp:669
static void LoadSSPar(PTable &NewTable, const Schema &S, const TStr &InFNm, const TIntV &RelevantCols, const char &Separator, TBool HasTitleLine)
Parallelly loads data from input file at InFNm into NewTable. Only work when NewTable has no string c...
Definition: table.cpp:507
Definition: bd.h:196
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
void TTable::LoadSSPar ( PTable NewTable,
const Schema S,
const TStr InFNm,
const TIntV RelevantCols,
const char &  Separator,
TBool  HasTitleLine 
)
staticprotected

Parallelly loads data from input file at InFNm into NewTable. Only work when NewTable has no string columns.

Definition at line 507 of file table.cpp.

508  {
509  // preloaded necessary variables
510  TInt RowLen = T->Sch.Len();
511  TVec<TAttrType> ColTypes = TVec<TAttrType>(RowLen);
512  for (TInt i = 0; i < RowLen; i++) {
513  ColTypes[i] = T->GetSchemaColType(i);
514  }
515 
516  TSsParserMP Ss(InFNm, Separator);
517  Ss.SkipCommentLines();
518 
519  // if title line (i.e. names of the columns) is included as first row in the
520  // input file - use it to validate schema
521  if (HasTitleLine) {
522  Ss.Next();
523  if (S.Len() != Ss.GetFlds()) {
524  printf("%s\n", Ss[0]); TExcept::Throw("Table Schema Mismatch!");
525  }
526  for (TInt i = 0; i < Ss.GetFlds(); i++) {
527  // remove carriage return char
528  TInt L = strlen(Ss[i]);
529  if (Ss[i][L-1] < ' ') { Ss[i][L-1] = 0; }
530  if (NormalizeColName(S[i].Val1) != NormalizeColName(Ss[i])) { TExcept::Throw("Table Schema Mismatch!"); }
531  }
532  }
533 
534  // Divide remaining part of stream into equal sized chunks
535  // Find starting position in stream for each thread
536  int64 Cnt = 0;
537  uint64 Pos = Ss.GetStreamPos();
538  uint64 Len = Ss.GetStreamLen();
539  uint64 Rem = Len - Pos;
540  int NumThreads = omp_get_max_threads();
541 
542  uint64 Delta = Rem / NumThreads;
543  if (Delta < 1) Delta = 1;
544 
545  TVec<uint64> StartIntV(NumThreads);
546  TVec<uint64> LineCountV(NumThreads);
547  TVec<uint64> PrefixSumV(NumThreads);
548 
549  StartIntV[0] = Pos;
550  for (int i = 1; i < NumThreads; i++) {
551  StartIntV[i] = StartIntV[i-1] + Delta;
552  }
553  StartIntV.Add(Len);
554 
555  // Find number of lines handled by each thread
556  omp_set_num_threads(NumThreads);
557  #pragma omp parallel for schedule(dynamic) reduction(+:Cnt)
558  for (int i = 0; i < NumThreads; i++) {
559  LineCountV[i] = Ss.CountNewLinesInRange(StartIntV[i], StartIntV[i+1]);
560  Cnt += LineCountV[i];
561  }
562 
563  // Calculate row index offsets for each thread
564  PrefixSumV[0] = 0;
565  for (int i = 1; i < NumThreads; i++) {
566  PrefixSumV[i] = PrefixSumV[i-1] + LineCountV[i-1];
567  }
568  Ss.SetStreamPos(Pos);
569 
570  // allocate memory for columns
571  TInt IntColIdx = 0;
572  TInt FltColIdx = 0;
573  for (TInt i = 0; i < RowLen; i++) {
574  switch (ColTypes[i]) {
575  case atInt:
576  T->IntCols[IntColIdx].Gen(Cnt);
577  IntColIdx++;
578  break;
579  case atFlt:
580  T->FltCols[FltColIdx].Gen(Cnt);
581  FltColIdx++;
582  break;
583  case atStr:
584  break;
585  }
586  }
587 
588  Cnt = 0;
589  omp_set_num_threads(NumThreads);
590  #pragma omp parallel for schedule(dynamic) reduction(+:Cnt)
591  for (int i = 0; i < NumThreads; i++) {
592  // calculate beginning of each line handled by thread
593  TVec<uint64> LineStartPosV = Ss.GetStartPosV(StartIntV[i], StartIntV[i+1]);
594 
595  // parse line and fill rows
596  for (uint64 k = 0; k < (uint64) LineStartPosV.Len(); k++) {
597  TVec<char*> FieldsV;
598  Ss.NextFromIndex(LineStartPosV[k], FieldsV);
599  if (FieldsV.Len() != S.Len()) {
600  TExcept::Throw("Error reading tsv file");
601  }
602  TInt IntColIdx = 0;
603  TInt FltColIdx = 0;
604  TInt RowIdx = PrefixSumV[i] + k;
605 
606  for (TInt j = 0; j < RowLen; j++) {
607  switch (ColTypes[j]) {
608  case atInt:
609  if (RelevantCols.Len() == 0) {
610  T->IntCols[IntColIdx][RowIdx] = \
611  (Ss.GetIntFromFldV(FieldsV, j));
612  } else {
613  T->IntCols[IntColIdx][RowIdx] = \
614  (Ss.GetIntFromFldV(FieldsV, RelevantCols[j]));
615  }
616  IntColIdx++;
617  break;
618  case atFlt:
619  if (RelevantCols.Len() == 0) {
620  T->FltCols[FltColIdx][RowIdx] = \
621  (Ss.GetFltFromFldV(FieldsV, j));
622  } else {
623  T->FltCols[FltColIdx][RowIdx] = \
624  (Ss.GetFltFromFldV(FieldsV, RelevantCols[j]));
625  }
626  FltColIdx++;
627  break;
628  case atStr:
629  TExcept::Throw("TTable::LoadSS:: Str Col found\n");
630  break;
631  }
632  }
633  Cnt++;
634  }
635  }
636 
637  // set number of rows and "Next" vector
638  T->NumRows = Cnt;
639  T->NumValidRows = T->NumRows;
640 
641  T->Next.Clr();
642  T->Next.Gen(Cnt);
643 
644  omp_set_num_threads(NumThreads);
645  #pragma omp parallel for schedule(dynamic, 10000)
646  for (int64 i = 0; i < Cnt-1; i++) {
647  T->Next[i] = i+1;
648  }
649  T->IsNextDirty = 0;
650  T->Next[Cnt-1] = Last;
651  T->LastValidRow = T->NumRows - 1;
652 
653  T->IdColName = "_id";
654  TInt IdCol = T->IntCols.Add();
655  T->IntCols[IdCol].Gen(Cnt);
656 
657  // initialize ID column
658  omp_set_num_threads(NumThreads);
659  #pragma omp parallel for schedule(dynamic, 10000)
660  for (int64 i = 0; i < Cnt; i++) {
661  T->IntCols[IdCol][i] = i;
662  }
663 
664  T->AddSchemaCol(T->IdColName, atInt);
665  T->AddColType(T->IdColName, atInt, T->IntCols.Len()-1);
666 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: gbase.h:23
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
unsigned long long uint64
Definition: bd.h:38
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
long long int64
Definition: bd.h:27
Definition: gbase.h:23
Definition: gbase.h:23
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
void TTable::LoadSSSeq ( PTable NewTable,
const Schema S,
const TStr InFNm,
const TIntV RelevantCols,
const char &  Separator,
TBool  HasTitleLine 
)
staticprotected

Sequentially loads data from input file at InFNm into NewTable.

Definition at line 669 of file table.cpp.

671  {
672  // preloaded necessary variables
673  int RowLen = T->Sch.Len();
674  TVec<TAttrType> ColTypes = TVec<TAttrType>(RowLen);
675  for (int i = 0; i < RowLen; i++) {
676  ColTypes[i] = T->GetSchemaColType(i);
677  }
678 
679  // Sequential load
680  TSsParser Ss(InFNm, Separator);
681  // if title line (i.e. names of the columns) is included as first row in the
682  // input file - use it to validate schema
683  if (HasTitleLine) {
684  Ss.Next();
685  if (S.Len() != Ss.GetFlds()) {
686  printf("%s\n", Ss[0]); TExcept::Throw("Table Schema Mismatch!");
687  }
688  for (int i = 0; i < Ss.GetFlds(); i++) {
689  // remove carriage return char
690  int L = strlen(Ss[i]);
691  if (Ss[i][L-1] < ' ') { Ss[i][L-1] = 0; }
692  if (NormalizeColName(S[i].Val1) != NormalizeColName(Ss[i])) { TExcept::Throw("Table Schema Mismatch!"); }
693  }
694  }
695 
696  // populate table columns
697  //printf("starting to populate table\n");
698  uint64 Cnt = 0;
699  while (Ss.Next()) {
700  int IntColIdx = 0;
701  int FltColIdx = 0;
702  int StrColIdx = 0;
703  Assert(Ss.GetFlds() == S.Len()); // compiled only in debug
704  if (Ss.GetFlds() != S.Len()) {
705  printf("%s\n", Ss[S.Len()]); TExcept::Throw("Error reading tsv file");
706  }
707  for (int i = 0; i < RowLen; i++) {
708  switch (ColTypes[i]) {
709  case atInt:
710  if (RelevantCols.Len() == 0) {
711  T->IntCols[IntColIdx].Add(Ss.GetInt(i));
712  } else {
713  T->IntCols[IntColIdx].Add(Ss.GetInt(RelevantCols[i]));
714  }
715  IntColIdx++;
716  break;
717  case atFlt:
718  if (RelevantCols.Len() == 0) {
719  T->FltCols[FltColIdx].Add(Ss.GetFlt(i));
720  } else {
721  T->FltCols[FltColIdx].Add(Ss.GetFlt(RelevantCols[i]));
722  }
723  FltColIdx++;
724  break;
725  case atStr:
726  int ColIdx;
727  if (RelevantCols.Len() == 0) {
728  ColIdx = i;
729  } else {
730  ColIdx = RelevantCols[i];
731  }
732  TStr Sval = TStr(Ss[ColIdx]);
733  T->AddStrVal(StrColIdx, Sval);
734  StrColIdx++;
735  break;
736  }
737  }
738  Cnt += 1;
739  }
740  //printf("finished populating table\n");
741  // set number of rows and "Next" vector
742  T->NumRows = static_cast<int>(Cnt);
743  T->NumValidRows = T->NumRows;
744 
745  T->Next.Clr();
746  T->Next.Gen(static_cast<int>(Cnt));
747  for (uint64 i = 0; i < Cnt-1; i++) {
748  T->Next[static_cast<int>(i)] = static_cast<int>(i+1);
749  }
750  T->IsNextDirty = 0;
751  T->Next[static_cast<int>(Cnt-1)] = Last;
752  T->LastValidRow = T->NumRows - 1;
753 
754  T->InitIds();
755 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: ss.h:72
Definition: gbase.h:23
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
unsigned long long uint64
Definition: bd.h:38
#define Assert(Cond)
Definition: bd.h:251
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
Definition: gbase.h:23
Definition: gbase.h:23
Vector is a sequence TVal objects representing an array that can change in size.
Definition: ds.h:430
void TTable::LoadTableShM ( TShMIn ShMIn,
TTableContext ContextTable 
)
private

Definition at line 360 of file table.cpp.

360  {
361  Context = ContextTable;
362  NumRows = TInt(ShMIn);
363  NumValidRows = TInt(ShMIn);
364  FirstValidRow = TInt(ShMIn);
365  LastValidRow = TInt(ShMIn);
366  Next.LoadShM(ShMIn);
367 
368  TLoadVecInit Fn;
369  IntCols.LoadShM(ShMIn, Fn);
370  FltCols.Load(ShMIn);
371  StrColMaps.LoadShM(ShMIn, Fn);
372  THash<TStr,TPair<TInt,TInt> > ColTypeIntMap;
373  ColTypeIntMap.LoadShM(ShMIn);
374 
375  GenerateColTypeMap(ColTypeIntMap);
376 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TTableContext * Context
Execution Context.
Definition: table.h:545
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
void GenerateColTypeMap(THash< TStr, TPair< TInt, TInt > > &ColTypeIntMap)
Definition: table.cpp:337
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
void LoadShM(TShMIn &ShMIn)
Load THash from shared memory file. Copying/Deleting Keys is illegal.
Definition: hash.h:157
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
void LoadShM(TShMIn &ShMIn)
Constructs the vector from a shared memory input.
Definition: ds.h:932
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
Definition: hash.h:97
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void TTable::Merge ( TIntV V,
TInt  Idx1,
TInt  Idx2,
TInt  Idx3,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc = true 
)
protected

Helper function for parallel QSort.

Definition at line 3178 of file table.cpp.

3178  {
3179  TInt i = Idx1, j = Idx2;
3180  TIntV SortedV;
3181  while (i < Idx2 && j < Idx3) {
3182  if (CompareRows(V[i], V[j], SortByTypes, SortByIndices, Asc) <= 0) {
3183  SortedV.Add(V[i]);
3184  i++;
3185  }
3186  else {
3187  SortedV.Add(V[j]);
3188  j++;
3189  }
3190  }
3191  while (i < Idx2) {
3192  SortedV.Add(V[i]);
3193  i++;
3194  }
3195  while (j < Idx3) {
3196  SortedV.Add(V[j]);
3197  j++;
3198  }
3199 
3200  for (TInt sz = 0; sz < Idx3 - Idx1; sz++) {
3201  V[Idx1 + sz] = SortedV[sz];
3202  }
3203 }
Definition: dt.h:1134
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
PTable TTable::Minus ( TTable Table)

Returns table with rows that are present in this table but not in given Table.

Definition at line 4592 of file table.cpp.

4592  {
4593  Schema NewSchema;
4594  THashSet<TInt> Collisions;
4595 
4596  for (TInt c = 0; c < Sch.Len(); c++) {
4597  if (Sch[c].Val1 != GetIdColName()) {
4598  NewSchema.Add(TPair<TStr,TAttrType>(Sch[c].Val1, Sch[c].Val2));
4599  }
4600  }
4601  PTable result = TTable::New(NewSchema, Context);
4602 
4603  Table.GetCollidingRows(*this, Collisions);
4604 
4605  // this part should be made faster by adding all the rows in one go
4606  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
4607  if (!Collisions.IsKey(it.GetRowIdx())) {
4608  result->AddRow(it);
4609  }
4610  }
4611  result->InitIds();
4612  return result;
4613 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
Iterator class for TTable rows.
Definition: table.h:330
void GetCollidingRows(const TTable &T, THashSet< TInt > &Collisions)
Gets set of row ids of rows common with table T.
Definition: table.cpp:4014
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
PTable TTable::Minus ( const PTable Table)
inline

Definition at line 1425 of file table.h.

1425 { return Minus(*Table); };
PTable Minus(TTable &Table)
Returns table with rows that are present in this table but not in given Table.
Definition: table.cpp:4592
static PTable TTable::New ( )
inlinestatic

Definition at line 932 of file table.h.

932 { return new TTable(); }
TTable()
Definition: table.cpp:302
static PTable TTable::New ( TTableContext Context)
inlinestatic

Definition at line 933 of file table.h.

933 { return new TTable(Context); }
TTable()
Definition: table.cpp:302
static PTable TTable::New ( const Schema S,
TTableContext Context 
)
inlinestatic

Definition at line 934 of file table.h.

934  {
935  return new TTable(S, Context);
936  }
TTable()
Definition: table.cpp:302
static PTable TTable::New ( const THash< TInt, TInt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)
inlinestatic

Returns pointer to a table constructed from given int->int hash.

Definition at line 938 of file table.h.

939  {
940  return new TTable(H, Col1, Col2, Context, IsStrKeys);
941  }
TTable()
Definition: table.cpp:302
static PTable TTable::New ( const THash< TInt, TFlt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)
inlinestatic

Returns pointer to a table constructed from given int->float hash.

Definition at line 943 of file table.h.

944  {
945  return new TTable(H, Col1, Col2, Context, IsStrKeys);
946  }
TTable()
Definition: table.cpp:302
static PTable TTable::New ( const PTable  Table)
inlinestatic

Returns pointer to a new table created from given Table.

Definition at line 948 of file table.h.

948 { return new TTable(*Table); }
TTable()
Definition: table.cpp:302
PNEANet TTable::NextGraphIterator ( )

Calls to this must be preceded by a call to one of the above ToGraph*Iterator functions.

Definition at line 3681 of file table.cpp.

3681  {
3682  return GetNextGraphFromSequence();
3683 }
PNEANet GetNextGraphFromSequence()
Returns the next graph in sequence corresponding to RowIdBuckets.
Definition: table.cpp:3634
static TStr TTable::NormalizeColName ( const TStr ColName)
inlinestatic

Adds suffix to column name if it doesn't exist.

Definition at line 530 of file table.h.

530  {
531  TStr Result = ColName;
532  int RLen = Result.Len();
533  if (RLen == 0) { return Result; }
534  if (Result.GetCh(0) == '_') { return Result; }
535  if (RLen >= 2 && Result.GetCh(RLen-2) == '-') { return Result; }
536  return Result + "-1";
537  }
int Len() const
Definition: dt.h:487
char GetCh(const int &ChN) const
Definition: dt.h:483
Definition: dt.h:412
static TStrV TTable::NormalizeColNameV ( const TStrV Cols)
inlinestatic

Adds suffix to column name if it doesn't exist.

Definition at line 539 of file table.h.

539  {
540  TStrV NCols;
541  for (TInt i = 0; i < Cols.Len(); i++) { NCols.Add(NormalizeColName(Cols[i])); }
542  return NCols;
543  }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::Order ( const TStrV OrderBy,
TStr  OrderColName = "",
TBool  ResetRankByMSC = false,
TBool  Asc = true 
)

Orders the rows according to the values in columns of OrderBy (in descending lexicographic order).

Definition at line 3240 of file table.cpp.

3240  {
3241  // get a vector of all valid row indices
3242  TIntV ValidRows = TIntV(NumValidRows);
3243  if (NumRows == NumValidRows) {
3244  for (TInt i = 0; i < NumValidRows; i++) {
3245  ValidRows[i] = i;
3246  }
3247  } else {
3248  TInt i = 0;
3249  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
3250  ValidRows[i] = RI.GetRowIdx();
3251  i++;
3252  }
3253  }
3254  TVec<TAttrType> OrderByTypes(OrderBy.Len());
3255  TIntV OrderByIndices(OrderBy.Len());
3256  for (TInt i = 0; i < OrderBy.Len(); i++) {
3257  OrderByTypes[i] = GetColType(OrderBy[i]);
3258  OrderByIndices[i] = GetColIdx(OrderBy[i]);
3259  }
3260 
3261  // sort that vector according to the attributes given in "OrderBy" in lexicographic order
3262 #ifdef USE_OPENMP
3263  if (GetMP()) {
3264  QSortPar(ValidRows, OrderByTypes, OrderByIndices, Asc);
3265  } else {
3266 #endif
3267  QSort(ValidRows, 0, NumValidRows-1, OrderByTypes, OrderByIndices, Asc);
3268 #ifdef USE_OPENMP
3269  }
3270 #endif
3271 
3272  // rewire Next vector
3273  IsNextDirty = 1;
3274  if (NumValidRows > 0) {
3275  FirstValidRow = ValidRows[0];
3276  } else {
3277  FirstValidRow = Last;
3278  }
3279  for (TInt i = 0; i < NumValidRows-1; i++) {
3280  Next[ValidRows[i]] = ValidRows[i+1];
3281  }
3282  if (NumValidRows > 0) {
3283  Next[ValidRows[NumValidRows-1]] = Last;
3284  LastValidRow = ValidRows[NumValidRows-1];
3285  } else {
3286  LastValidRow = Last;
3287  }
3288 
3289  // add rank column
3290  if (!OrderColName.Empty()) {
3291  TIntV RankCol = TIntV(NumRows);
3292  for (TInt i = 0; i < NumValidRows; i++) {
3293  RankCol[ValidRows[i]] = i;
3294  }
3295  if (ResetRankByMSC) {
3296  for (TInt i = 1; i < NumValidRows; i++) {
3297  TStr GroupName = OrderBy[0];
3298  if (GetStrVal(GroupName, ValidRows[i]) != GetStrVal(GroupName, ValidRows[i-1])) {
3299  RankCol[ValidRows[i]] = 0;
3300  } else {
3301  RankCol[ValidRows[i]] = RankCol[ValidRows[i-1]] + 1;
3302  }
3303  }
3304  }
3305  IntCols.Add(RankCol);
3306  AddSchemaCol(OrderColName, atInt);
3307  AddColType(OrderColName, atInt, IntCols.Len()-1);
3308  }
3309 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void QSort(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Performs QSort on given vector V.
Definition: table.cpp:3154
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
static TInt GetMP()
Definition: table.h:527
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
void QSortPar(TIntV &V, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Performs QSort in parallel on given vector V.
Definition: table.cpp:3206
Iterator class for TTable rows.
Definition: table.h:330
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TStr GetStrVal(TInt ColIdx, TInt RowIdx) const
Gets the value in column with id ColIdx at row RowIdx.
Definition: table.h:626
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
bool Empty() const
Definition: dt.h:488
TVec< TInt > TIntV
Definition: ds.h:1594
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TInt TTable::Partition ( TIntV V,
TInt  StartIdx,
TInt  EndIdx,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc 
)
protected

Partitions vector for QSort.

Definition at line 3126 of file table.cpp.

3126  {
3127 
3128  // test if the elements are already sorted
3129  TInt j;
3130  for (j = StartIdx; j < EndIdx; j++) {
3131  if (CompareRows(V[j], V[j+1], SortByTypes, SortByIndices, Asc) > 0) {
3132  break;
3133  }
3134  }
3135  if (j >= EndIdx) {
3136  return EndIdx+1;
3137  }
3138 
3139  TInt PivotIdx = GetPivot(V, StartIdx, EndIdx, SortByTypes, SortByIndices, Asc);
3140  TInt Pivot = V[PivotIdx];
3141  V.Swap(PivotIdx, EndIdx);
3142  TInt StoreIdx = StartIdx;
3143  for (TInt i = StartIdx; i < EndIdx; i++) {
3144  if (CompareRows(V[i], Pivot, SortByTypes, SortByIndices, Asc) <= 0) {
3145  V.Swap(i, StoreIdx);
3146  StoreIdx++;
3147  }
3148  }
3149  // move pivot value to its place
3150  V.Swap(StoreIdx, EndIdx);
3151  return StoreIdx;
3152 }
TInt GetPivot(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc)
Gets pivot element for QSort.
Definition: table.cpp:3110
void Swap(TVec< TVal, TSizeTy > &Vec)
Swaps the contents of the vector with Vec.
Definition: ds.h:1101
Definition: dt.h:1134
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
TInt TTable::PartitionKeyVal ( TIntV Key,
TIntV Val,
TInt  Start,
TInt  End 
)
staticprotected

Definition at line 5355 of file table.cpp.

5355  {
5356  TInt Pivot = GetPivotKeyVal(Key, Val, Start, End);
5357  //printf("Pivot=%d\n", Pivot.Val);
5358  TInt PivotKey = Key[Pivot];
5359  TInt PivotVal = Val[Pivot];
5360  Key.Swap(Pivot, End);
5361  Val.Swap(Pivot, End);
5362  TInt StoreIdx = Start;
5363  for (TInt i = Start; i < End; i++) {
5364  //printf("%d %d %d %d\n", Key[i].Val, Val[i].Val, PivotKey.Val, PivotVal.Val);
5365  if (CompareKeyVal(Key[i], Val[i], PivotKey, PivotVal) <= 0) {
5366  Key.Swap(i, StoreIdx);
5367  Val.Swap(i, StoreIdx);
5368  StoreIdx++;
5369  }
5370  }
5371  //printf("StoreIdx=%d\n", StoreIdx.Val);
5372  // move pivot value to its place
5373  Key.Swap(StoreIdx, End);
5374  Val.Swap(StoreIdx, End);
5375  return StoreIdx;
5376 }
static TInt CompareKeyVal(const TInt &K1, const TInt &V1, const TInt &K2, const TInt &V2)
Definition: table.cpp:5297
void Swap(TVec< TVal, TSizeTy > &Vec)
Swaps the contents of the vector with Vec.
Definition: ds.h:1101
Definition: dt.h:1134
static TInt GetPivotKeyVal(TIntV &Key, TIntV &Val, TInt Start, TInt End)
Definition: table.cpp:5338
void TTable::PrintContextSize ( )

Definition at line 3959 of file table.cpp.

3959  {
3960  printf("Number of strings in pool: ");
3961  printf("%d\n", Context->StringVals.Len());
3962  printf("Number of entries in hash table: ");
3963  printf("%d\n", Context->StringVals.Reserved());
3964  TSize MemUsed = GetContextMemUsedKB();
3965  printf("Approximate context size is %s KB\n",
3966  TUInt64::GetStr(MemUsed).CStr());
3967 }
int Reserved() const
Definition: hash.h:843
int Len() const
Definition: hash.h:842
TTableContext * Context
Execution Context.
Definition: table.h:545
size_t TSize
Definition: bd.h:58
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
TStr GetStr() const
Definition: dt.h:1360
TSize GetContextMemUsedKB()
Returns approximate memory used by table context in [KB].
Definition: table.cpp:3969
void TTable::PrintGrouping ( const THash< TGroupKey, TIntV > &  Grouping) const
protected

Definition at line 1788 of file table.cpp.

1788  {
1789  for(THash<TGroupKey, TIntV>::TIter it = Mapping.BegI(); it < Mapping.EndI(); it++){
1790  TGroupKey gk = it.GetKey();
1791  TIntV ik = gk.Val1;
1792  TFltV fk = gk.Val2;
1793  for(int i = 0; i < ik.Len(); i++){ printf("%d ",ik[i].Val);}
1794  for(int i = 0; i < fk.Len(); i++){ printf("%f ",fk[i].Val);}
1795  printf("-->");
1796  TIntV v = it.GetDat();
1797  for(int i = 0; i < v.Len(); i++){ printf("%d ",v[i].Val);}
1798  printf("\n");
1799  }
1800 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
const TVal & GetDat(const TVal &Val) const
Returns reference to the first occurrence of element Val.
Definition: ds.h:838
Definition: ds.h:32
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
THKeyDat * EndI
Definition: hash.h:54
void TTable::PrintSize ( )

Definition at line 3930 of file table.cpp.

3930  {
3931  printf("Total number of rows: %d\n", NumRows.Val);
3932  printf("Number of valid rows: %d\n", NumValidRows.Val);
3933  printf("Number of Int columns: %d\n", IntCols.Len());
3934  printf("Number of Flt columns: %d\n", FltCols.Len());
3935  printf("Number of Str columns: %d\n", StrColMaps.Len());
3936  TSize MemUsed = GetMemUsedKB();
3937  printf("Approximate table size is %s KB\n", TUInt64::GetStr(MemUsed).CStr());
3938 }
TSize GetMemUsedKB()
Returns approximate memory used by table in [KB].
Definition: table.cpp:3940
int Val
Definition: dt.h:1136
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
size_t TSize
Definition: bd.h:58
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TStr GetStr() const
Definition: dt.h:1360
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
PTable TTable::Project ( const TStrV ProjectCols)

Returns table with only the columns in ProjectCols.

Definition at line 4615 of file table.cpp.

4615  {
4616  Schema NewSchema;
4617  for (TInt c = 0; c < ProjectCols.Len(); c++) {
4618  if (!IsColName(ProjectCols[c])) { TExcept::Throw("no such column " + ProjectCols[c]); }
4619  NewSchema.Add(TPair<TStr,TAttrType>(ProjectCols[c], GetColType(ProjectCols[c])));
4620  }
4621 
4622  PTable result = TTable::New(NewSchema, Context);
4623  result->AddTable(*this);
4624  result->InitIds();
4625  return result;
4626 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
static PTable New()
Definition: table.h:932
void TTable::ProjectInPlace ( const TStrV ProjectCols)

Keeps only the columns specified in ProjectCols.

Definition at line 5239 of file table.cpp.

5239  {
5240  TStrV NProjectCols = NormalizeColNameV(ProjectCols);
5241  for (TInt c = 0; c < NProjectCols.Len(); c++) {
5242  if (!IsColName(NProjectCols[c])) { TExcept::Throw("no such column " + NProjectCols[c]); }
5243  }
5244  THashSet<TStr> ProjectColsSet = THashSet<TStr>(NProjectCols);
5245  // Delete the column vectors
5246  for (TInt i = Sch.Len() - 1; i >= 0; i--) {
5247  TStr ColName = GetSchemaColName(i);
5248  if (ProjectColsSet.IsKey(ColName) || ColName == IdColName) { continue; }
5249  TAttrType ColType = GetSchemaColType(i);
5250  TInt ColId = GetColIdx(ColName);
5251  switch (ColType) {
5252  case atInt:
5253  IntCols.Del(ColId);
5254  break;
5255  case atFlt:
5256  FltCols.Del(ColId);
5257  break;
5258  case atStr:
5259  StrColMaps.Del(ColId);
5260  break;
5261  }
5262  }
5263 
5264  // Rebuild the ColTypeMap with new indexes of the column vectors
5265  TInt IntColCnt = 0;
5266  TInt FltColCnt = 0;
5267  TInt StrColCnt = 0;
5268  ColTypeMap.Clr();
5269  for (TInt i = 0; i < Sch.Len(); i++) {
5270  TStr ColName = GetSchemaColName(i);
5271  if (!ProjectColsSet.IsKey(ColName) && ColName != IdColName) { continue; }
5272  TAttrType ColType = GetSchemaColType(i);
5273  switch (ColType) {
5274  case atInt:
5275  AddColType(ColName, atInt, IntColCnt);
5276  IntColCnt++;
5277  break;
5278  case atFlt:
5279  AddColType(ColName, atFlt, FltColCnt);
5280  FltColCnt++;
5281  break;
5282  case atStr:
5283  AddColType(ColName, atStr, StrColCnt);
5284  StrColCnt++;
5285  break;
5286  }
5287  }
5288 
5289  // Update schema
5290  for (TInt i = Sch.Len() - 1; i >= 0; i--) {
5291  TStr ColName = GetSchemaColName(i);
5292  if (ProjectColsSet.IsKey(ColName) || ColName == IdColName) { continue; }
5293  Sch.Del(i);
5294  }
5295 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
Schema Sch
Table Schema.
Definition: table.h:549
void Del(const TSizeTy &ValN)
Removes the element at position ValN.
Definition: ds.h:1189
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
Definition: gbase.h:23
TAttrType GetSchemaColType(TInt Idx) const
Gets type of the column with index Idx in the schema.
Definition: table.h:640
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
Definition: gbase.h:23
void Clr(const bool &DoDel=true, const int &NoDelLim=-1, const bool &ResetDat=true)
Definition: hash.h:361
Definition: gbase.h:23
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::QSort ( TIntV V,
TInt  StartIdx,
TInt  EndIdx,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc = true 
)
protected

Performs QSort on given vector V.

Definition at line 3154 of file table.cpp.

3154  {
3155  if (StartIdx < EndIdx) {
3156  if (EndIdx - StartIdx < 20) {
3157  ISort(V, StartIdx, EndIdx, SortByTypes, SortByIndices, Asc);
3158  } else {
3159  TInt Pivot = Partition(V, StartIdx, EndIdx, SortByTypes, SortByIndices, Asc);
3160  if (Pivot > EndIdx) {
3161  return;
3162  }
3163  // Everything <= Pivot will be in StartIdx, Pivot-1. Shrink this
3164  // range to ignore elements equal to the pivot in the first
3165  // recursive call, to optimize for the case when a lot of
3166  // rows are equal.
3167  int Ub = Pivot - 1;
3168  while (Ub >= StartIdx && CompareRows(
3169  V[Ub], V[Pivot], SortByTypes, SortByIndices, Asc) == 0) {
3170  Ub -= 1;
3171  }
3172  QSort(V, StartIdx, Ub, SortByTypes, SortByIndices, Asc);
3173  QSort(V, Pivot+1, EndIdx, SortByTypes, SortByIndices, Asc);
3174  }
3175  }
3176 }
void QSort(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Performs QSort on given vector V.
Definition: table.cpp:3154
Definition: dt.h:1134
void ISort(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Performs insertion sort on given vector V.
Definition: table.cpp:3096
TInt CompareRows(TInt R1, TInt R2, const TAttrType &CompareByType, const TInt &CompareByIndex, TBool Asc=true)
Returns positive value if R1 is bigger, negative value if R2 is bigger, and 0 if they are equal (strc...
Definition: table.cpp:3064
TInt Partition(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc)
Partitions vector for QSort.
Definition: table.cpp:3126
void TTable::QSortKeyVal ( TIntV Key,
TIntV Val,
TInt  Start,
TInt  End 
)
staticprotected

Definition at line 5378 of file table.cpp.

5378  {
5379  //printf("Thread=%d, Start=%d, End=%d\n", omp_get_thread_num(), Start.Val, End.Val);
5380  TInt L = End-Start;
5381  if (L <= 0) { return; }
5382  if (CheckSortedKeyVal(Key, Val, Start, End) == 0) { return; }
5383 
5384  if (L <= 20) { ISortKeyVal(Key, Val, Start, End); }
5385  else {
5386  TInt Pivot = PartitionKeyVal(Key, Val, Start, End);
5387 
5388  if (Pivot > End) { return; }
5389  if (L <= 500000) {
5390  QSortKeyVal(Key, Val, Start, Pivot-1);
5391  QSortKeyVal(Key, Val, Pivot+1, End);
5392  } else {
5393 #ifdef USE_OPENMP
5394 #ifndef GLib_WIN32
5395  #pragma omp task untied shared(Key, Val)
5396 #endif
5397 #endif
5398  { QSortKeyVal(Key, Val, Start, Pivot-1); }
5399 
5400 #ifdef USE_OPENMP
5401 #ifndef GLib_WIN32
5402  #pragma omp task untied shared(Key, Val)
5403 #endif
5404 #endif
5405  { QSortKeyVal(Key, Val, Pivot+1, End); }
5406  }
5407  }
5408 }
static TInt PartitionKeyVal(TIntV &Key, TIntV &Val, TInt Start, TInt End)
Definition: table.cpp:5355
static void QSortKeyVal(TIntV &Key, TIntV &Val, TInt Start, TInt End)
Definition: table.cpp:5378
Definition: dt.h:1134
static void ISortKeyVal(TIntV &Key, TIntV &Val, TInt Start, TInt End)
Definition: table.cpp:5321
static TInt CheckSortedKeyVal(TIntV &Key, TIntV &Val, TInt Start, TInt End)
Definition: table.cpp:5310
void TTable::QSortPar ( TIntV V,
const TVec< TAttrType > &  SortByTypes,
const TIntV SortByIndices,
TBool  Asc = true 
)
protected

Performs QSort in parallel on given vector V.

Definition at line 3206 of file table.cpp.

3206  {
3207  TInt NumThreads = 8; // Setting this to 8 because that results in the fastest sorting on Madmax.
3208  TInt Sz = V.Len();
3209  TIntV IndV, NextV;
3210  for (TInt i = 0; i < NumThreads; i++) {
3211  IndV.Add(i * (Sz / NumThreads));
3212  }
3213  IndV.Add(Sz);
3214 
3215  omp_set_num_threads(NumThreads);
3216  #pragma omp parallel for
3217  for (int i = 0; i < NumThreads; i++) {
3218  QSort(V, IndV[i], IndV[i+1] - 1, SortByTypes, SortByIndices, Asc);
3219  }
3220 
3221  while (NumThreads > 1) {
3222  omp_set_num_threads(NumThreads / 2);
3223  #pragma omp parallel for
3224  for (int i = 0; i < NumThreads; i += 2) {
3225  Merge(V, IndV[i], IndV[i+1], IndV[i+2], SortByTypes, SortByIndices, Asc);
3226  }
3227 
3228  NextV.Clr();
3229  for (TInt i = 0; i < NumThreads; i+=2) {
3230  NextV.Add(IndV[i]);
3231  }
3232  NextV.Add(Sz);
3233  IndV = NextV;
3234 
3235  NumThreads = NumThreads / 2;
3236  }
3237 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void QSort(TIntV &V, TInt StartIdx, TInt EndIdx, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Performs QSort on given vector V.
Definition: table.cpp:3154
void Clr(const bool &DoDel=true, const TSizeTy &NoDelLim=-1)
Clears the contents of the vector.
Definition: ds.h:1022
Definition: dt.h:1134
void Merge(TIntV &V, TInt Idx1, TInt Idx2, TInt Idx3, const TVec< TAttrType > &SortByTypes, const TIntV &SortByIndices, TBool Asc=true)
Helper function for parallel QSort.
Definition: table.cpp:3178
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::ReadFltCol ( const TStr ColName,
TFltV Result 
) const

Reads values of entire float column into Result.

Definition at line 5221 of file table.cpp.

5221  {
5222  if (!IsColName(ColName)) { TExcept::Throw("no such column " + ColName); }
5223  if (GetColType(ColName) != atFlt) { TExcept::Throw("not a floating point column " + ColName); }
5224  TInt ColId = GetColIdx(ColName);
5225  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
5226  Result.Add(it.GetFltAttr(ColId));
5227  }
5228 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::ReadIntCol ( const TStr ColName,
TIntV Result 
) const

Reads values of entire int column into Result.

Definition at line 5212 of file table.cpp.

5212  {
5213  if (!IsColName(ColName)) { TExcept::Throw("no such column " + ColName); }
5214  if (GetColType(ColName) != atInt) { TExcept::Throw("not an integer column " + ColName); }
5215  TInt ColId = GetColIdx(ColName);
5216  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
5217  Result.Add(it.GetIntAttr(ColId));
5218  }
5219 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::ReadStrCol ( const TStr ColName,
TStrV Result 
) const

Reads values of entire string column into Result.

Definition at line 5230 of file table.cpp.

5230  {
5231  if (!IsColName(ColName)) { TExcept::Throw("no such column " + ColName); }
5232  if (GetColType(ColName) != atStr) { TExcept::Throw("not a string column " + ColName); }
5233  TInt ColId = GetColIdx(ColName);
5234  for (TRowIterator it = BegRI(); it < EndRI(); it++) {
5235  Result.Add(it.GetStrAttr(ColId));
5236  }
5237 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::Reindex ( )
protected

Reinitializes row ids.

Register (cache) result of a grouping statement by a single group-by attribute T is a hash table mapping a key x to rows keyed by x => DISABLED FOR NOW

Definition at line 1889 of file table.cpp.

1889  {
1890  RowIdMap.Clr();
1891  TInt IdColIdx = GetColIdx(IdColName);
1892  TInt IdCnt = 0;
1893  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
1894  IntCols[IdColIdx][RI.GetRowIdx()] = IdCnt;
1895  RowIdMap.AddDat(RI.GetRowIdx(), IdCnt);
1896  IdCnt++;
1897  }
1898 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Iterator class for TTable rows.
Definition: table.h:330
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
void Clr(const bool &DoDel=true, const int &NoDelLim=-1, const bool &ResetDat=true)
Definition: hash.h:361
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::RemoveFirstRow ( )
protected

Removes first valid row of the table.

Definition at line 1122 of file table.cpp.

1122  {
1123  if (FirstValidRow == LastValidRow) {
1124  LastValidRow = -1;
1125  }
1126 
1127  TInt Old = FirstValidRow;
1129  Next[Old] = TTable::Invalid;
1130  NumValidRows--;
1131  TInt IdColIdx = GetColIdx(GetIdColName());
1132  RowIdMap.AddDat(IntCols[IdColIdx][Old], Invalid);
1133 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::RemoveRow ( TInt  RowIdx,
TInt  PrevRowIdx 
)
protected

Removes row with id RowIdx.

Definition at line 1135 of file table.cpp.

1135  {
1136  if (RowIdx == FirstValidRow) {
1137  RemoveFirstRow();
1138  return;
1139  }
1140  Assert(RowIdx != TTable::Invalid);
1141  if (RowIdx == TTable::Last) { return; }
1142  Next[PrevRowIdx] = Next[RowIdx];
1143  if (LastValidRow == RowIdx) {
1144  LastValidRow = RowIdx;
1145  }
1146  Next[RowIdx] = TTable::Invalid;
1147  NumValidRows--;
1148  TInt IdColIdx = GetColIdx(GetIdColName());
1149  RowIdMap.AddDat(IntCols[IdColIdx][RowIdx], Invalid);
1150 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
#define Assert(Cond)
Definition: bd.h:251
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void RemoveFirstRow()
Removes first valid row of the table.
Definition: table.cpp:1122
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::Rename ( const TStr Column,
const TStr NewLabel 
)

Renames a column.

Definition at line 1105 of file table.cpp.

1105  {
1106  // This function is necessary, for example to take the union of two tables
1107  // where the attribute names don't match.
1108  if (!IsColName(column)) { TExcept::Throw("no such column " + column); }
1109  TPair<TAttrType,TInt> ColVal = GetColTypeMap(column);
1110  DelColType(column);
1111  AddColType(NewLabel, ColVal);
1112  TStr NColName = NormalizeColName(column);
1113  TStr NLabel = NormalizeColName(NewLabel);
1114  for (TInt c = 0; c < Sch.Len(); c++) {
1115  if (Sch[c].Val1 == NColName) {
1116  Sch.SetVal(c, TPair<TStr, TAttrType>(NLabel, Sch[c].Val2));
1117  break;
1118  }
1119  }
1120 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TPair< TAttrType, TInt > GetColTypeMap(const TStr &ColName) const
Gets column type and index of ColName.
Definition: table.h:666
void SetVal(const TSizeTy &ValN, const TVal &Val)
Sets the value of element at position ValN to Val.
Definition: ds.h:653
void DelColType(const TStr &ColName)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:661
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: ds.h:32
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: dt.h:412
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
TStr TTable::RenumberColName ( const TStr ColName) const
protected

Returns a re-numbered column name based on number of existing columns with conflicting names.

Definition at line 4632 of file table.cpp.

4632  {
4633  TStr NColName = ColName;
4634  if (NColName.GetCh(NColName.Len()-2) == '-') {
4635  NColName = NColName.GetSubStr(0,NColName.Len()-3);
4636  }
4637  TInt Conflicts = 0;
4638  for (TInt i = 0; i < Sch.Len(); i++) {
4639  if (NColName == Sch[i].Val1.GetSubStr(0, Sch[i].Val1.Len()-3)) {
4640  Conflicts++;
4641  }
4642  }
4643  Conflicts++;
4644  NColName = NColName + "-" + Conflicts.GetStr();
4645  return NColName;
4646 }
TStr GetStr() const
Definition: dt.h:1197
int Len() const
Definition: dt.h:487
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr GetSubStr(const int &BChN, const int &EChN) const
Definition: dt.cpp:811
char GetCh(const int &ChN) const
Definition: dt.h:483
Definition: dt.h:1134
Definition: dt.h:412
TInt TTable::RequestIndexFlt ( const TStr ColName)

Creates Index for Flt Column ColName.

Creates an Index on float column ColName. The index is hash-based, going from the column value to a vector of RowIdxs in the table that correspond to the value. If it exists, the index is used by the Get*RowIdxByVal functions; else, those functions will loop over the entire table. The index is NOT updated automatically when the table is modified; it is the user's responsibility to call RequestIndex after modifying the table if the index is necessary.

Definition at line 5495 of file table.cpp.

5495  {
5496 
5497  THash<TFlt, TIntV> NewIndex;
5498  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5499  TFlt ValAtRow = RowI.GetFltAttr(ColName);
5500  TInt RowIdx = RowI.GetRowIdx();
5501  if (NewIndex.IsKey(ValAtRow)) {
5502  TIntV Curr_V = NewIndex.GetDat(ValAtRow);
5503  Curr_V.Add(RowIdx);
5504  }
5505  else {
5506  TIntV New_V;
5507  New_V.Add(RowIdx);
5508  NewIndex.AddDat(ValAtRow, New_V);
5509  }
5510  }
5511  FltColIndexes.AddDat(ColName, NewIndex);
5512  return 0;
5513 }
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
THash< TStr, THash< TFlt, TIntV > > FltColIndexes
Indexes for Float Columns.
Definition: table.h:570
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: hash.h:97
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
TInt TTable::RequestIndexInt ( const TStr ColName)

Creates Index for Int Column ColName.

Creates an Index on integer column ColName. The index is hash-based, going from the column value to a vector of RowIdxs in the table that correspond to the value. If it exists, the index is used by the Get*RowIdxByVal functions; else, those functions will loop over the entire table. The index is NOT updated automatically when the table is modified; it is the user's responsibility to call RequestIndex after modifying the table if the index is necessary.

Definition at line 5476 of file table.cpp.

5476  {
5477 
5478  THash<TInt, TIntV> NewIndex;
5479  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5480  TInt ValAtRow = RowI.GetIntAttr(ColName);
5481  TInt RowIdx = RowI.GetRowIdx();
5482  if (NewIndex.IsKey(ValAtRow)) {
5483  TIntV Curr_V = NewIndex.GetDat(ValAtRow);
5484  Curr_V.Add(RowIdx);
5485  }
5486  else {
5487  TIntV New_V;
5488  New_V.Add(RowIdx);
5489  NewIndex.AddDat(ValAtRow, New_V);
5490  }
5491  }
5492  IntColIndexes.AddDat(ColName, NewIndex);
5493  return 0;
5494 }
THash< TStr, THash< TInt, TIntV > > IntColIndexes
Indexes for Int Columns.
Definition: table.h:568
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
TInt TTable::RequestIndexStrMap ( const TStr ColName)

Creates Index for Str Column ColName.

Creates an Index on string column given by ColName. The index is hash-based, going from the column value (that is, the integer mapping of the string value) to a vector of RowIdxs in the table that correspond to the value. If it exists, the index is used by the Get*RowIdxByVal functions; else, those functions will loop over the entire table. The index is NOT updated automatically when the table is modified; it is the user's responsibility to call RequestIndex after modifying the table if the index is necessary.

Definition at line 5514 of file table.cpp.

5514  {
5515  THash<TInt, TIntV> NewIndex;
5516  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
5517  TInt MapAtRow = RowI.GetStrMapByName(ColName);
5518  TInt RowIdx = RowI.GetRowIdx();
5519  if (NewIndex.IsKey(MapAtRow)) {
5520  TIntV Curr_V = NewIndex.GetDat(MapAtRow);
5521  Curr_V.Add(RowIdx);
5522  }
5523  else {
5524  TIntV New_V;
5525  New_V.Add(RowIdx);
5526  NewIndex.AddDat(MapAtRow, New_V);
5527  }
5528  }
5529  StrMapColIndexes.AddDat(ColName, NewIndex);
5530  return 0;
5531 }
THash< TStr, THash< TInt, TIntV > > StrMapColIndexes
Indexes for String Columns.
Definition: table.h:569
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::ResizeTable ( int  RowCount)
protected

Resizes the table to hold RowCount rows.

Definition at line 4330 of file table.cpp.

4330  {
4331  if (RowCount == 0) {
4332  // initialize empty table
4333  NumValidRows = 0;
4336  }
4337  if (Next.Len() < RowCount) {
4338  TInt FltOffset = IntCols.Len();
4339  TInt StrOffset = FltOffset + FltCols.Len();
4340  TInt TotalCols = StrOffset + StrColMaps.Len();
4341 #ifdef USE_OPENMP
4342  #pragma omp parallel for schedule(static)
4343 #endif
4344  for (int i = 0; i < TotalCols+1; i++) {
4345  if (i < FltOffset) {
4346  IntCols[i].Reserve(RowCount, RowCount);
4347  } else if (i < StrOffset) {
4348  FltCols[i-FltOffset].Reserve(RowCount, RowCount);
4349  } else if (i < TotalCols) {
4350  StrColMaps[i-StrOffset].Reserve(RowCount, RowCount);
4351  } else {
4352  Next.Reserve(RowCount, RowCount);
4353  }
4354  }
4355  } else if (Next.Len() > RowCount) {
4356  TInt FltOffset = IntCols.Len();
4357  TInt StrOffset = FltOffset + FltCols.Len();
4358  TInt TotalCols = StrOffset + StrColMaps.Len();
4359 #ifdef USE_OPENMP
4360  #pragma omp parallel for schedule(static)
4361 #endif
4362  for (int i = 0; i < TotalCols+1; i++) {
4363  if (i < FltOffset) {
4364  IntCols[i].Trunc(RowCount);
4365  } else if (i < StrOffset) {
4366  FltCols[i-FltOffset].Trunc(RowCount);
4367  } else if (i < TotalCols) {
4368  StrColMaps[i-StrOffset].Trunc(RowCount);
4369  } else {
4370  Next.Trunc(RowCount);
4371  }
4372  }
4373  }
4374 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void Reserve(const TSizeTy &_MxVals)
Reserves enough memory for the vector to store _MxVals elements.
Definition: ds.h:543
void Trunc(const TSizeTy &_Vals=-1)
Truncates the vector's length and capacity to _Vals elements.
Definition: ds.h:1033
void TTable::Save ( TSOut SOut)

Saves table schema and content to a binary format.

Note that TTableContext must be saved separately as it can be shared among multiple tables.

Definition at line 854 of file table.cpp.

854  {
855  NumRows.Save(SOut);
856  NumValidRows.Save(SOut);
857  FirstValidRow.Save(SOut);
858  LastValidRow.Save(SOut);
859  Next.Save(SOut);
860  IntCols.Save(SOut);
861  FltCols.Save(SOut);
862  StrColMaps.Save(SOut);
863 
864  THash<TStr,TPair<TInt,TInt> > ColTypeIntMap;
865  TInt atIntVal = TInt(0);
866  TInt atFltVal = TInt(1);
867  TInt atStrVal = TInt(2);
868  for (THash<TStr,TPair<TAttrType,TInt> >::TIter it = ColTypeMap.BegI(); it < ColTypeMap.EndI(); it++) {
869  TPair<TAttrType,TInt> dat = it.GetDat();
870  TStr DColName = DenormalizeColName(it.GetKey());
871  switch (dat.GetVal1()) {
872  case atInt:
873  ColTypeIntMap.AddDat(DColName, TPair<TInt,TInt>(atIntVal, dat.GetVal2()));
874  break;
875  case atFlt:
876  ColTypeIntMap.AddDat(DColName, TPair<TInt,TInt>(atFltVal, dat.GetVal2()));
877  break;
878  case atStr:
879  ColTypeIntMap.AddDat(DColName, TPair<TInt,TInt>(atStrVal, dat.GetVal2()));
880  break;
881  }
882  }
883  ColTypeIntMap.Save(SOut);
884  SOut.Flush();
885 }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TStr DenormalizeColName(const TStr &ColName) const
Removes suffix to column name if exists.
Definition: table.cpp:4648
void Save(TSOut &SOut) const
Definition: dt.h:1150
THash< TStr, TPair< TAttrType, TInt > > ColTypeMap
Definition: table.h:564
const TVal1 & GetVal1() const
Definition: ds.h:60
TIter BegI() const
Definition: hash.h:213
void Save(TSOut &SOut) const
Definition: hash.h:183
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
const TVal2 & GetVal2() const
Definition: ds.h:61
TIter EndI() const
Definition: hash.h:218
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
void Save(TSOut &SOut) const
Definition: ds.h:954
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
virtual void Flush()=0
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
Definition: ds.h:32
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
Definition: dt.h:412
Definition: hash.h:97
Definition: gbase.h:23
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::SaveBin ( const TStr OutFNm)

Saves table schema and content to a binary file.

Definition at line 849 of file table.cpp.

849  {
850  TFOut SOut(OutFNm);
851  Save(SOut);
852 }
Definition: fl.h:319
void Save(TSOut &SOut)
Saves table schema and content to a binary format.
Definition: table.cpp:854
void TTable::SaveSS ( const TStr OutFNm)

Saves table schema and content to a TSV file.

Definition at line 800 of file table.cpp.

800  {
801  if (NumValidRows == 0) {
802  printf("Table is empty");
803  return;
804  }
805  FILE* F = fopen(OutFNm.CStr(), "w");
806  // debug
807  if (F == NULL) {
808  printf("failed to open file %s\n", OutFNm.CStr());
809  perror("fail ");
810  return;
811  }
812 
813  Dump(F);
814 
815 #if 0
816  Schema DSch = DenormalizeSchema();
817 
818  TInt L = Sch.Len();
819  // print title (schema)
820  fprintf(F, "# ");
821  for (TInt i = 0; i < L-1; i++) {
822  fprintf(F, "%s\t", DSch[i].Val1.CStr());
823  }
824  fprintf(F, "%s\n", DSch[L-1].Val1.CStr());
825  // print table contents
826  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
827  for (TInt i = 0; i < L; i++) {
828  char C = (i == L-1) ? '\n' : '\t';
829  switch (GetSchemaColType(i)) {
830  case atInt: {
831  fprintf(F, "%d%c", RowI.GetIntAttr(GetSchemaColName(i)).Val, C);
832  break;
833  }
834  case atFlt: {
835  fprintf(F, "%f%c", RowI.GetFltAttr(GetSchemaColName(i)).Val, C);
836  break;
837  }
838  case atStr: {
839  fprintf(F, "%s%c", RowI.GetStrAttr(GetSchemaColName(i)).CStr(), C);
840  break;
841  }
842  }
843  }
844  }
845 #endif
846  fclose(F);
847 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
void Dump(FILE *OutF=stdout) const
Prints table contents to a text file.
Definition: table.cpp:887
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
TAttrType GetSchemaColType(TInt Idx) const
Gets type of the column with index Idx in the schema.
Definition: table.h:640
Schema DenormalizeSchema() const
Removes suffix to column names in the Schema.
Definition: table.cpp:4665
TStr GetSchemaColName(TInt Idx) const
Gets name of the column with index Idx in the schema.
Definition: table.h:638
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
Definition: gbase.h:23
char * CStr()
Definition: dt.h:476
void TTable::Select ( TPredicate Predicate,
TIntV SelectedRows,
TBool  Remove = true 
)

Selects rows that satisfy given Predicate.

Select. Has two modes of operation:

  1. If Remove == true then (logically) remove the rows for which the predicate doesn't hold
  2. If Remove == false then add the physical indices of the rows for which the predicate holds to the vactor SelectedRows

Definition at line 2750 of file table.cpp.

2750  {
2751  TIntV Selected;
2752  TStrV RelevantCols;
2753  Predicate.GetVariables(RelevantCols);
2754  TInt NumRelevantCols = RelevantCols.Len();
2755  TVec<TAttrType> ColTypes = TVec<TAttrType>(NumRelevantCols);
2756  TIntV ColIndices = TIntV(NumRelevantCols);
2757  for (TInt i = 0; i < NumRelevantCols; i++) {
2758  ColTypes[i] = GetColType(RelevantCols[i]);
2759  ColIndices[i] = GetColIdx(RelevantCols[i]);
2760  }
2761 
2762  if (Remove) {
2764  while (RowI.GetNextRowIdx() != Last) {
2765  // prepare arguments for predicate evaluation
2766  for (TInt i = 0; i < NumRelevantCols; i++) {
2767  switch (ColTypes[i]) {
2768  case atInt:
2769  Predicate.SetIntVal(RelevantCols[i], RowI.GetNextIntAttr(ColIndices[i]));
2770  break;
2771  case atFlt:
2772  Predicate.SetFltVal(RelevantCols[i], RowI.GetNextFltAttr(ColIndices[i]));
2773  break;
2774  case atStr:
2775  Predicate.SetStrVal(RelevantCols[i], RowI.GetNextStrAttr(ColIndices[i]));
2776  break;
2777  }
2778  }
2779  if (!Predicate.Eval()) {
2780  RowI.RemoveNext();
2781  } else {
2782  RowI++;
2783  }
2784  }
2785  } else {
2786  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
2787  for (TInt i = 0; i < NumRelevantCols; i++) {
2788  switch (ColTypes[i]) {
2789  case atInt:
2790  Predicate.SetIntVal(RelevantCols[i], RowI.GetIntAttr(RelevantCols[i]));
2791  break;
2792  case atFlt:
2793  Predicate.SetFltVal(RelevantCols[i], RowI.GetFltAttr(RelevantCols[i]));
2794  break;
2795  case atStr:
2796  Predicate.SetStrVal(RelevantCols[i], RowI.GetStrAttr(RelevantCols[i]));
2797  break;
2798  }
2799  }
2800  if (Predicate.Eval()) { SelectedRows.Add(RowI.GetRowIdx()); }
2801  }
2802  }
2803 }
void SetFltVal(TStr VarName, TFlt VarVal)
Set flt variable value in the predicate or all the children that use it.
Definition: table.h:100
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TFlt GetNextFltAttr(TInt ColIdx) const
Returns value of float attribute specified by float column index for next row.
Definition: table.cpp:252
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr GetNextStrAttr(TInt ColIdx) const
Returns value of string attribute specified by string column index for next row.
Definition: table.cpp:256
TRowIteratorWithRemove BegRIWR()
Gets iterator with reomve to the first valid row.
Definition: table.h:1245
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows, that allows logical row removal while iterating.
Definition: table.h:374
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
TInt GetNextRowIdx() const
Gets physical index of next row.
Definition: table.cpp:243
void SetIntVal(TStr VarName, TInt VarVal)
Set int variable value in the predicate or all the children that use it.
Definition: table.h:98
void SetStrVal(TStr VarName, TStr VarVal)
Set str variable value in the predicate or all the children that use it.
Definition: table.h:102
void RemoveNext()
Removes next row.
Definition: table.cpp:278
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TInt GetRowIdx() const
Gets physical index of current row.
Definition: table.cpp:239
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: gbase.h:23
TInt GetNextIntAttr(TInt ColIdx) const
Returns value of integer attribute specified by integer column index for next row.
Definition: table.cpp:248
void GetVariables(TStrV &Variables)
Get variables in current predicate.
Definition: table.cpp:10
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool Eval()
Return the result of evaluating current predicate.
Definition: table.cpp:14
void TTable::Select ( TPredicate Predicate)
inline

Definition at line 1266 of file table.h.

1266  {
1267  TIntV SelectedRows;
1268  Select(Predicate, SelectedRows, true);
1269  }
void Select(TPredicate &Predicate, TIntV &SelectedRows, TBool Remove=true)
Selects rows that satisfy given Predicate.
Definition: table.cpp:2750
void TTable::SelectAtomic ( const TStr Col1,
const TStr Col2,
TPredComp  Cmp,
TIntV SelectedRows,
TBool  Remove = true 
)

Selects rows using atomic compare operation.

Select atomic - optimized cases of select with predicate of an atomic form: compare attribute to attribute or compare attribute to a constant

Definition at line 2813 of file table.cpp.

2813  {
2814  const TAttrType Ty1 = GetColType(Col1);
2815  const TAttrType Ty2 = GetColType(Col2);
2816  const TInt ColIdx1 = GetColIdx(Col1);
2817  const TInt ColIdx2 = GetColIdx(Col2);
2818  if (Ty1 != Ty2) {
2819  TExcept::Throw("SelectAtomic: diff types");
2820  }
2821  if (Cmp == SUBSTR || Cmp == SUPERSTR) { Assert(Ty1 == atStr); }
2822 
2823  if (Remove) {
2825  while (RowI.GetNextRowIdx() != Last) {
2826 
2827  TBool Result;
2828  switch (Ty1) {
2829  case atInt:
2830  Result = TPredicate::EvalAtom(RowI.GetNextIntAttr(ColIdx1), RowI.GetNextIntAttr(ColIdx2), Cmp);
2831  break;
2832  case atFlt:
2833  Result = TPredicate::EvalAtom(RowI.GetNextFltAttr(ColIdx1), RowI.GetNextFltAttr(ColIdx2), Cmp);
2834  break;
2835  case atStr:
2836  Result = TPredicate::EvalStrAtom(RowI.GetNextStrAttr(ColIdx1), RowI.GetNextStrAttr(ColIdx2), Cmp);
2837  break;
2838  }
2839 
2840  if (!Result) {
2841  RowI.RemoveNext();
2842  } else {
2843  RowI++;
2844  }
2845 
2846  }
2847  } else {
2848  for (TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++) {
2849  TBool Result;
2850  switch (Ty1) {
2851  case atInt:
2852  Result = TPredicate::EvalAtom(RowI.GetIntAttr(Col1), RowI.GetIntAttr(Col2), Cmp);
2853  break;
2854  case atFlt:
2855  Result = TPredicate::EvalAtom(RowI.GetFltAttr(Col1), RowI.GetFltAttr(Col2), Cmp);
2856  break;
2857  case atStr:
2858  Result = TPredicate::EvalStrAtom(RowI.GetStrAttr(Col1), RowI.GetStrAttr(Col2), Cmp);
2859  break;
2860  }
2861  if (Result) { SelectedRows.Add(RowI.GetRowIdx()); }
2862  }
2863  }
2864 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TFlt GetNextFltAttr(TInt ColIdx) const
Returns value of float attribute specified by float column index for next row.
Definition: table.cpp:252
TStr GetNextStrAttr(TInt ColIdx) const
Returns value of string attribute specified by string column index for next row.
Definition: table.cpp:256
TRowIteratorWithRemove BegRIWR()
Gets iterator with reomve to the first valid row.
Definition: table.h:1245
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows, that allows logical row removal while iterating.
Definition: table.h:374
static TBool EvalStrAtom(const TStr &Val1, const TStr &Val2, TPredComp Cmp)
Compare atomic string values Val1 and Val2 using predicate Cmp.
Definition: table.h:123
Definition: gbase.h:23
Definition: table.h:7
Iterator class for TTable rows.
Definition: table.h:330
TInt GetNextRowIdx() const
Gets physical index of next row.
Definition: table.cpp:243
void RemoveNext()
Removes next row.
Definition: table.cpp:278
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
#define Assert(Cond)
Definition: bd.h:251
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
TInt GetRowIdx() const
Gets physical index of current row.
Definition: table.cpp:239
static TBool EvalAtom(T Val1, T Val2, TPredComp Cmp)
Compare atomic values Val1 and Val2 using predicate Cmp.
Definition: table.h:110
Definition: table.h:7
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
Definition: gbase.h:23
TInt GetNextIntAttr(TInt ColIdx) const
Returns value of integer attribute specified by integer column index for next row.
Definition: table.cpp:248
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: dt.h:971
void TTable::SelectAtomic ( const TStr Col1,
const TStr Col2,
TPredComp  Cmp 
)
inline

Definition at line 1278 of file table.h.

1278  {
1279  TIntV SelectedRows;
1280  SelectAtomic(Col1, Col2, Cmp, SelectedRows, true);
1281  }
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void SelectAtomic(const TStr &Col1, const TStr &Col2, TPredComp Cmp, TIntV &SelectedRows, TBool Remove=true)
Selects rows using atomic compare operation.
Definition: table.cpp:2813
void TTable::SelectAtomicConst ( const TStr Col,
const TPrimitive Val,
TPredComp  Cmp,
TIntV SelectedRows,
PTable SelectedTable,
TBool  Remove = true,
TBool  Table = true 
)

Selects rows where the value of Col matches given primitive Val.

Definition at line 2873 of file table.cpp.

2874  {
2875  //double startFn = omp_get_wtime();
2876  TStr ValTStr(Val.GetStr());
2877  TAttrType Type = GetColType(Col);
2878  TInt ColIdx = GetColIdx(Col);
2879 
2880  if (Type != Val.GetType()) {
2881  TExcept::Throw("SelectAtomicConst: coltype does not match const type");
2882  }
2883 
2884  if(Remove){
2885 #ifdef USE_OPENMP
2886  if (GetMP()) {
2887  //double endInit = omp_get_wtime();
2888  //printf("Init time = %f\n", endInit-startFn);
2889  TIntPrV Partitions;
2890  GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
2891  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
2892  int RemoveCount = 0;
2893  //double endPart = omp_get_wtime();
2894  //printf("Partition time = %f\n", endPart-endInit);
2895 
2896  TIntPrV Bounds(Partitions.Len());
2897 
2898  // #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD) reduction(+:RemoveCount) shared(Val)
2899  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD) reduction(+:RemoveCount)
2900  for (int i = 0; i < Partitions.Len(); i++){
2901  //TPrimitive ThreadLocalVal(Val);
2902  TRowIterator RowI(Partitions[i].GetVal1(), this);
2903  TRowIterator EndI(Partitions[i].GetVal2(), this);
2904  TInt FirstRowIdx = TTable::Invalid;
2905  TInt LastRowIdx = TTable::Invalid;
2906  TBool First = true;
2907  while (RowI < EndI) {
2908  TInt CurrRowIdx = RowI.GetRowIdx();
2909  TBool Result;
2910  if (Type != atStr) {
2911  Result = RowI.CompareAtomicConst(ColIdx, Val, Cmp);
2912  } else {
2913  Result = RowI.CompareAtomicConstTStr(ColIdx, ValTStr, Cmp);
2914  }
2915  RowI++;
2916  if(!Result) {
2917  Next[CurrRowIdx] = TTable::Invalid;
2918  RemoveCount++;
2919  } else {
2920  if (First) { FirstRowIdx = CurrRowIdx; First = false; }
2921  else { Next[LastRowIdx] = CurrRowIdx; }
2922  LastRowIdx = CurrRowIdx;
2923  }
2924  }
2925  Bounds[i] = TIntPr(FirstRowIdx, LastRowIdx);
2926  //printf("Thread %d: i = %d, start = %d, end = %d\n", omp_get_thread_num(), i,
2927  // Partitions[i].GetVal1().Val, Partitions[i].GetVal2().Val);
2928  }
2929  //double endIter = omp_get_wtime();
2930  //printf("Iter time = %f\n", endIter-endPart);
2931 
2932  // repair the next vector
2933  TInt CurrBound = 0;
2934  while (CurrBound < Bounds.Len() && Bounds[CurrBound].Val1 == TTable::Invalid) {
2935  CurrBound++;
2936  }
2937  if (CurrBound == Bounds.Len()) {
2938  // selected table is empty
2939  Assert(NumValidRows == RemoveCount);
2940  NumValidRows = 0;
2943  } else {
2944  NumValidRows -= RemoveCount;
2945  FirstValidRow = Bounds[CurrBound].Val1;
2946  LastValidRow = Bounds[CurrBound].Val2;
2947  TInt PrevBound = CurrBound;
2948  CurrBound++;
2949  while (CurrBound < Bounds.Len()) {
2950  if (Bounds[CurrBound].Val1 == TTable::Invalid) { CurrBound++; continue; }
2951  Next[Bounds[PrevBound].Val2] = Bounds[CurrBound].Val1;
2952  LastValidRow = Bounds[CurrBound].Val2;
2953  PrevBound = CurrBound;
2954  CurrBound++;
2955  }
2956  Next[Bounds[PrevBound].Val2] = TTable::Last;
2957  }
2958  IsNextDirty = 1;
2959  //double endRepair = omp_get_wtime();
2960  //printf("Repair time = %f\n", endRepair-endIter);
2961  } else {
2962 #endif
2964  while(RowI.GetNextRowIdx() != Last){
2965  if (!RowI.CompareAtomicConst(ColIdx, Val, Cmp)) {
2966  RowI.RemoveNext();
2967  } else {
2968  RowI++;
2969  }
2970  }
2971  IsNextDirty = 1;
2972 #ifdef USE_OPENMP
2973  }
2974 #endif
2975  } else if (Table) {
2976 #ifdef USE_OPENMP
2977  if (GetMP()) {
2978  //double endInit = omp_get_wtime();
2979  //printf("Init time = %f\n", endInit-startFn);
2980  TIntPrV Partitions;
2981  GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
2982  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
2983  //double endPart = omp_get_wtime();
2984  //printf("Partition time = %f\n", endPart-endInit);
2985 
2986  int TotalSelectedRows = 0;
2987  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD) reduction(+:TotalSelectedRows)
2988  for (int i = 0; i < Partitions.Len(); i++){
2989  TRowIterator RowI(Partitions[i].GetVal1(), this);
2990  TRowIterator EndI(Partitions[i].GetVal2(), this);
2991  while (RowI < EndI) {
2992  if (Type != atStr) {
2993  if (RowI.CompareAtomicConst(ColIdx, Val, Cmp)) {
2994  TotalSelectedRows++;
2995  }
2996  } else {
2997  if (RowI.CompareAtomicConstTStr(ColIdx, ValTStr, Cmp)) {
2998  TotalSelectedRows++;
2999  }
3000  }
3001  RowI++;
3002  }
3003  }
3004  //double endCount = omp_get_wtime();
3005  //printf("Count time = %f\n", endCount-endPart);
3006 
3007  SelectedTable->ResizeTable(TotalSelectedRows);
3008  //double endResize = omp_get_wtime();
3009  //printf("Resize time = %f\n", endResize-endCount);
3010 
3011  if (TotalSelectedRows == 0) {
3012  // printf("Select: Empty output!\n");
3013  return;
3014  }
3015 
3016  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
3017  for (int i = 0; i < Partitions.Len(); i++){
3018  TIntV LocalSelectedRows;
3019  LocalSelectedRows.Reserve(PartitionSize);
3020  TRowIterator RowI(Partitions[i].GetVal1(), this);
3021  TRowIterator EndI(Partitions[i].GetVal2(), this);
3022  while (RowI < EndI) {
3023  if (Type != atStr) {
3024  if (RowI.CompareAtomicConst(ColIdx, Val, Cmp)) {
3025  LocalSelectedRows.Add(RowI.GetRowIdx());
3026  }
3027  } else {
3028  if (RowI.CompareAtomicConstTStr(ColIdx, ValTStr, Cmp)) {
3029  LocalSelectedRows.Add(RowI.GetRowIdx());
3030  }
3031  }
3032  RowI++;
3033  }
3034  SelectedTable->AddSelectedRows(*this, LocalSelectedRows);
3035  //printf("Thread %d: i = %d, start = %d, end = %d\n", omp_get_thread_num(), i,
3036  // Partitions[i].GetVal1().Val, Partitions[i].GetVal2().Val);
3037  }
3038  //double endIter = omp_get_wtime();
3039  //printf("Iter time = %f\n", endIter-endResize);
3040 
3041  //SelectedTable->ResizeTable(SelectedTable->GetNumValidRows());
3042  //double endResize2 = omp_get_wtime();
3043  //printf("Resize2 time = %f\n", endResize2-endIter);
3044  SelectedTable->SetFirstValidRow();
3045  } else {
3046 #endif
3047  for(TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++){
3048  if (RowI.CompareAtomicConst(ColIdx, Val, Cmp)) {
3049  SelectedTable->AddRow(RowI);
3050  }
3051  }
3052 #ifdef USE_OPENMP
3053  }
3054 #endif
3055  } else {
3056  for(TRowIterator RowI = BegRI(); RowI < EndRI(); RowI++){
3057  if (RowI.CompareAtomicConst(ColIdx, Val, Cmp)) {
3058  SelectedRows.Add(RowI.GetRowIdx());
3059  }
3060  }
3061  }
3062 }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
static TInt GetMP()
Definition: table.h:527
TRowIteratorWithRemove BegRIWR()
Gets iterator with reomve to the first valid row.
Definition: table.h:1245
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows, that allows logical row removal while iterating.
Definition: table.h:374
Iterator class for TTable rows.
Definition: table.h:330
TInt GetNextRowIdx() const
Gets physical index of next row.
Definition: table.cpp:243
void RemoveNext()
Removes next row.
Definition: table.cpp:278
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TStr GetStr() const
Definition: table.h:228
#define Assert(Cond)
Definition: bd.h:251
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TAttrType GetType() const
Definition: table.h:229
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
Definition: dt.h:412
TBool CompareAtomicConst(TInt ColIdx, const TPrimitive &Val, TPredComp Cmp)
Compares value in column ColIdx with given primitive Val.
Definition: table.cpp:282
TInt GetRowIdx() const
Gets physical index of current row.
Definition: table.cpp:239
TInt IsNextDirty
Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing Get...
Definition: table.h:603
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
void Reserve(const TSizeTy &_MxVals)
Reserves enough memory for the vector to store _MxVals elements.
Definition: ds.h:543
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: dt.h:971
template<class T >
void TTable::SelectAtomicConst ( const TStr Col,
const T &  Val,
TPredComp  Cmp 
)
inline

Definition at line 1290 of file table.h.

1290  {
1291  TIntV SelectedRows;
1292  PTable SelectedTable;
1293  SelectAtomicConst(Col, TPrimitive(Val), Cmp, SelectedRows, SelectedTable, true, false);
1294  }
Primitive class: Wrapper around primitive data types.
Definition: table.h:211
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
Definition: bd.h:196
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
template<class T >
void TTable::SelectAtomicConst ( const TStr Col,
const T &  Val,
TPredComp  Cmp,
PTable SelectedTable 
)
inline

Definition at line 1296 of file table.h.

1296  {
1297  TIntV SelectedRows;
1298  SelectAtomicConst(Col, TPrimitive(Val), Cmp, SelectedRows, SelectedTable, false, true);
1299  }
Primitive class: Wrapper around primitive data types.
Definition: table.h:211
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicFltConst ( const TStr Col,
const TFlt Val,
TPredComp  Cmp 
)
inline

Definition at line 1323 of file table.h.

1323  {
1324  SelectAtomicConst(Col, Val, Cmp);
1325  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicFltConst ( const TStr Col,
const TFlt Val,
TPredComp  Cmp,
PTable SelectedTable 
)
inline

Definition at line 1326 of file table.h.

1326  {
1327  SelectAtomicConst(Col, Val, Cmp, SelectedTable);
1328  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicIntConst ( const TStr Col,
const TInt Val,
TPredComp  Cmp 
)
inline

Definition at line 1309 of file table.h.

1309  {
1310  SelectAtomicConst(Col, Val, Cmp);
1311  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicIntConst ( const TStr Col,
const TInt Val,
TPredComp  Cmp,
PTable SelectedTable 
)
inline

Definition at line 1312 of file table.h.

1312  {
1313  SelectAtomicConst(Col, Val, Cmp, SelectedTable);
1314  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicStrConst ( const TStr Col,
const TStr Val,
TPredComp  Cmp 
)
inline

Definition at line 1316 of file table.h.

1316  {
1317  SelectAtomicConst(Col, Val, Cmp);
1318  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectAtomicStrConst ( const TStr Col,
const TStr Val,
TPredComp  Cmp,
PTable SelectedTable 
)
inline

Definition at line 1319 of file table.h.

1319  {
1320  SelectAtomicConst(Col, Val, Cmp, SelectedTable);
1321  }
void SelectAtomicConst(const TStr &Col, const TPrimitive &Val, TPredComp Cmp, TIntV &SelectedRows, PTable &SelectedTable, TBool Remove=true, TBool Table=true)
Selects rows where the value of Col matches given primitive Val.
Definition: table.cpp:2873
bool Cmp(const int &RelOp, const TRec &Rec1, const TRec &Rec2)
Definition: bd.h:426
void TTable::SelectFirstNRows ( const TInt N)

Selects first N rows from the table.

Definition at line 3357 of file table.cpp.

3357  {
3358  if (N == 0) {
3359  LastValidRow = -1;
3360  return;
3361  }
3362  TRowIterator RowI = BegRI();
3363  TInt count = 1;
3364  while (count < N) {
3365  if (!(RowI < EndRI())) {
3366  return; // The table contains less than N rows
3367  }
3368  RowI++;
3369  count++;
3370  }
3371  NumValidRows = N;
3372  TInt LastId = RowI.GetRowIdx();
3373  if (Next[LastId] == Last) {
3374  return; // The table contains exactly N rows
3375  }
3376  // The table contains more than N rows
3377  TInt CurrId = LastId;
3378  while (Next[CurrId] != Last) {
3379  Assert(Next[CurrId] != Invalid);
3380  TInt NextId = Next[CurrId];
3381  Next[CurrId] = Invalid;
3382  CurrId = NextId;
3383  }
3384  Next[LastId] = Last;
3385  LastValidRow = LastId;
3386 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
#define Assert(Cond)
Definition: bd.h:251
TInt GetRowIdx() const
Gets the id of the row pointed by this iterator.
Definition: table.cpp:151
Definition: dt.h:1134
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
PTable TTable::SelfJoin ( const TStr Col)
inline

Joins table with itself, on values of Col.

Definition at line 1366 of file table.h.

1366 { return Join(Col, *this, Col); }
PTable Join(const TStr &Col1, const TTable &Table, const TStr &Col2)
Performs equijoin.
Definition: table.cpp:2272
PTable TTable::SelfSimJoin ( const TStrV Cols,
const TStr DistanceColName,
const TSimType SimType,
const TFlt Threshold 
)
inline

Definition at line 1367 of file table.h.

1367 { return SimJoin(Cols, *this, Cols, DistanceColName, SimType, Threshold); }
PTable SimJoin(const TStrV &Cols1, const TTable &Table, const TStrV &Cols2, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
Performs join if the distance between two rows is less than the specified threshold.
Definition: table.cpp:1994
PTable TTable::SelfSimJoinPerGroup ( const TStr GroupAttr,
const TStr SimCol,
const TStr DistanceColName,
const TSimType SimType,
const TFlt Threshold 
)

Performs join if the distance between two rows is less than the specified threshold.

Returns table with schema (GroupId1, GroupId2, Similarity).

Definition at line 2094 of file table.cpp.

2095 {
2096  if(!IsColName(SimCol) || !IsColName(GroupAttr)){
2097  TExcept::Throw("No such column found in table");
2098  }
2099 
2100  PTable JointTable = New(Context);
2101  // Initialize the joint table - (GroupId1, GroupId2, Similarity)
2102  JointTable->IntCols = TVec<TIntV>(2);
2103  JointTable->FltCols = TVec<TFltV>(1);
2104 
2105  for(TInt i=0;i<2;i++){
2106  TInt Suffix = i+1;
2107  TStr CName = "GroupId_" + Suffix.GetStr();
2109  JointTable->AddColType(CName, Group);
2110  JointTable->AddSchemaCol(CName, atInt);
2111  }
2112 
2114  JointTable->AddColType(DistanceColName, Group);
2115  JointTable->AddSchemaCol(DistanceColName, atFlt);
2116 
2118 
2119  TAttrType attrType = GetColType(SimCol);
2120  TInt GroupColIdx = GetColIdx(GroupAttr);
2121  TInt SimColIdx = GetColIdx(SimCol);
2122 
2123  for (TRowIterator RowI = this->BegRI(); RowI < this->EndRI(); RowI++) {
2124  TInt GroupId = IntCols[GroupColIdx][RowI.GetRowIdx()];
2125 
2126  if(attrType==atInt || attrType==atStr)
2127  {
2128  if(!TIntHH.IsKey(GroupId)){
2130  TIntHH.AddDat(GroupId, TIntH);
2131  }
2132 
2133  THash<TInt, TInt>& TIntH = TIntHH.GetDat(GroupId);
2134  TInt SimAttrVal = (attrType==atInt ? IntCols[SimColIdx][RowI.GetRowIdx()] : StrColMaps[SimColIdx][RowI.GetRowIdx()]);
2135  TIntH.AddDat(SimAttrVal, 0);
2136  }
2137  else
2138  {
2139  TExcept::Throw("Attribute type not supported.");
2140  }
2141  }
2142 
2143  // Iterate through every pair of groups and calculate the distance
2144  for (THash<TInt, THash<TInt, TInt> >::TIter it1 = TIntHH.BegI(); it1 < TIntHH.EndI(); it1++) {
2145  THash<TInt, TInt> Vals1H = it1.GetDat();
2146  TInt GroupId1 = it1.GetKey();
2147 
2148  for (THash<TInt, THash<TInt, TInt> >::TIter it2 = TIntHH.BegI(); it2 < TIntHH.EndI(); it2++) {
2149  int intersectionCount = 0;
2150  TInt GroupId2 = it2.GetKey();
2151  THash<TInt, TInt> Vals2H = it2.GetDat();
2152 
2153  for(THash<TInt, TInt>::TIter it = Vals1H.BegI(); it < Vals1H.EndI(); it++)
2154  {
2155  TInt Val = it.GetKey();
2156  if(Vals2H.IsKey(Val)){
2157  intersectionCount+=1;
2158  }
2159  }
2160 
2161  int unionCount = Vals1H.Len() + Vals2H.Len() - intersectionCount;
2162  float distance = 1.0f - (float)intersectionCount/unionCount;
2163 
2164  // Add a new row to the JointTable
2165  if(distance<=Threshold){
2166  JointTable->IntCols[0].Add(GroupId1);
2167  JointTable->IntCols[1].Add(GroupId2);
2168  JointTable->FltCols[0].Add(distance);
2169  JointTable->IncrementNext();
2170  }
2171  }
2172  }
2173 
2174  JointTable->InitIds();
2175  return JointTable;
2176 }
TStr GetStr() const
Definition: dt.h:1197
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TIter BegI() const
Definition: hash.h:213
TTableContext * Context
Execution Context.
Definition: table.h:545
void Group(const TStrV &GroupBy, const TStr &GroupColName, TBool Ordered=true, TBool UsePhysicalIds=true)
Groups rows depending on values of GroupBy columns.
Definition: table.cpp:1569
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TIter EndI() const
Definition: hash.h:218
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
THash< TInt, TInt > TIntH
Definition: hash.h:607
Definition: dt.h:1134
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
Definition: hash.h:97
Definition: gbase.h:23
Definition: bd.h:196
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
int Len() const
Definition: hash.h:228
static PTable New()
Definition: table.h:932
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
PTable TTable::SelfSimJoinPerGroup ( const TStrV GroupBy,
const TStr SimCol,
const TStr DistanceColName,
const TSimType SimType,
const TFlt Threshold 
)

Performs join if the distance between two rows is less than the specified threshold.

SimJoinPerGroup performs SimJoin based on a set of attributes. Performs the grouping internally and returns a projection of the columns on which groupby was performed along with the similarity.

Definition at line 2180 of file table.cpp.

2181  {
2182  TStrV NGroupBy = NormalizeColNameV(GroupBy);
2183  TStrV ProjectionV;
2184 
2185  // Only keep the GroupBy cols and the SimCol
2186  for(TInt i=0; i<GroupBy.Len(); i++)
2187  {
2188  ProjectionV.Add(GroupBy[i]);
2189  }
2190 
2191  ProjectionV.Add(SimCol);
2192  ProjectInPlace(ProjectionV);
2193 
2194  TStr CName = "Group";
2195  TIntV UniqueVec;
2197  GroupAux(NGroupBy, Grouping, false, CName, false, UniqueVec);
2198  PTable GroupJointTable = SelfSimJoinPerGroup(CName, SimCol, DistanceColName, SimType, Threshold);
2199  PTable JointTable = InitializeJointTable(*this);
2200 
2201  // Hash of groupid to any arbitrary row of that group. Arbitrary because the GroupBy
2202  // columns within that group are the same, so we can choose any one.
2203  THash<TInt, TInt> GroupIdH;
2204 
2205  for(THash<TGroupKey, TPair<TInt, TIntV> >::TIter it=Grouping.BegI(); it<Grouping.EndI(); it++)
2206  {
2207  TPair<TInt, TIntV> group = it.GetDat();
2208  TInt GroupNum = group.Val1;
2209  TIntV RowIds = group.Val2;
2210 
2211  if(!GroupIdH.IsKey(GroupNum))
2212  {
2213  TInt RandomRowId = RowIds[0]; // Arbitrarily select the 1st row.
2214  GroupIdH.AddDat(GroupNum, RandomRowId);
2215  }
2216  }
2217 
2218  for(TRowIterator RowI = GroupJointTable->BegRI(); RowI < GroupJointTable->EndRI(); RowI++)
2219  {
2220  // The GroupJoinTable has a well defined structure - columns 0 and 1 are GroupIds
2221  TInt GroupId1 = GroupJointTable->IntCols[0][RowI.GetRowIdx()];
2222  TInt GroupId2 = GroupJointTable->IntCols[1][RowI.GetRowIdx()];
2223 
2224  // Get the rows for groupid1 and groupid and arbitrary select one row
2225  TInt RowId1 = GroupIdH.GetDat(GroupId1);
2226  TInt RowId2 = GroupIdH.GetDat(GroupId2);
2227  JointTable->AddJointRow(*this, *this, RowId1, RowId2);
2228  }
2229 
2230  // Add the simiarlity column from the GroupJointTable - GroupJointTable has a
2231  // well defined structure - The first float column is the similarity;
2232  JointTable->StoreFltCol(DistanceColName, GroupJointTable->FltCols[0]);
2233  ProjectionV.Clr();
2234  ProjectionV.Add(DistanceColName);
2235 
2236  // Find the GroupBy columns in the JointTable by matching the Suffix of the Schema
2237  // columns with the original GroupBy columns - Note that Join renames columns.
2238  for(TInt i=0; i<GroupBy.Len(); i++){
2239  for(TInt j=0; j<JointTable->Sch.Len(); j++)
2240  {
2241  TStr ColName = JointTable->Sch[j].Val1;
2242  if(ColName.IsStrIn(GroupBy[i]))
2243  {
2244  ProjectionV.Add(ColName);
2245  }
2246  }
2247  }
2248 
2249  JointTable->ProjectInPlace(ProjectionV);
2250  JointTable->InitIds();
2251  return JointTable;
2252 }
TIter BegI() const
Definition: hash.h:213
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
PTable SelfSimJoinPerGroup(const TStr &GroupAttr, const TStr &SimCol, const TStr &DistanceColName, const TSimType &SimType, const TFlt &Threshold)
Performs join if the distance between two rows is less than the specified threshold.
Definition: table.cpp:2094
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TIter EndI() const
Definition: hash.h:218
Iterator class for TTable rows.
Definition: table.h:330
void Clr(const bool &DoDel=true, const TSizeTy &NoDelLim=-1)
Clears the contents of the vector.
Definition: ds.h:1022
void ProjectInPlace(const TStrV &ProjectCols)
Keeps only the columns specified in ProjectCols.
Definition: table.cpp:5239
Definition: dt.h:1134
Definition: ds.h:32
Definition: dt.h:412
Definition: hash.h:97
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: bd.h:196
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
bool IsStrIn(const TStr &Str) const
Definition: dt.h:554
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
void TTable::SetCommonNodeAttrs ( const TStr SrcAttr,
const TStr DstAttr,
const TStr CommonAttrName 
)
inline

Sets the columns to be used as both src and dst node attributes.

Definition at line 1188 of file table.h.

1188  {
1189  CommonNodeAttrs.Add(TStrTr(NormalizeColName(SrcAttr), NormalizeColName(DstAttr), NormalizeColName(CommonAttrName)));
1190  }
TStrTrV CommonNodeAttrs
List of attribute pairs with values common to source and destination and their common given name...
Definition: table.h:594
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TTriple< TStr, TStr, TStr > TStrTr
Definition: ds.h:186
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::SetDstCol ( const TStr Dst)
inline

Sets the name of the column to be used as dst nodes in the graph.

Definition at line 1167 of file table.h.

1167  {
1168  if (!IsColName(Dst)) { TExcept::Throw(Dst + ": no such column"); }
1169  DstCol = NormalizeColName(Dst);
1170  }
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TStr DstCol
Column (attribute) to serve as dst nodes when constructing the graph.
Definition: table.h:590
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::SetFirstValidRow ( )
inlineprotected

Sets the first valid row of the TTable.

Definition at line 811 of file table.h.

811  {
812  for (int i = 0; i < Next.Len(); i++) {
813  if(Next[i] != TTable::Invalid) { FirstValidRow = i; return;}
814  }
815  TExcept::Throw("SetFirstValidRow: Table is empty");
816  }
TInt FirstValidRow
Physical index of first valid row.
Definition: table.h:553
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
static const TInt Invalid
Special value for Next vector entry - logically removed row.
Definition: table.h:487
void TTable::SetFltColToConstMP ( TInt  UpdateColIdx,
TFlt  DefaultFltVal 
)

Definition at line 4152 of file table.cpp.

4152  {
4153  if(!GetMP()){ TExcept::Throw("Not Using MP!");}
4154  TIntPrV Partitions;
4155  GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
4156  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
4157  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD)
4158  for (int i = 0; i < Partitions.Len(); i++){
4159  TRowIterator RowI(Partitions[i].GetVal1(), this);
4160  TRowIterator EndI(Partitions[i].GetVal2(), this);
4161  while(RowI < EndI){
4162  FltCols[UpdateColIdx][RowI.GetRowIdx()] = DefaultFltVal;
4163  RowI++;
4164  }
4165  }
4166 }
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static TInt GetMP()
Definition: table.h:527
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
static void TTable::SetMP ( TInt  Value)
inlinestatic

Definition at line 526 of file table.h.

526 { UseMP = Value; }
static TInt UseMP
Global switch for choosing multi-threaded versions of TTable functions.
Definition: table.h:489
void TTable::SetSrcCol ( const TStr Src)
inline

Sets the name of the column to be used as src nodes in the graph.

Definition at line 1160 of file table.h.

1160  {
1161  if (!IsColName(Src)) { TExcept::Throw(Src + ": no such column"); }
1162  SrcCol = NormalizeColName(Src);
1163  }
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TStr SrcCol
Column (attribute) to serve as src nodes when constructing the graph.
Definition: table.h:589
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
PTable TTable::SimJoin ( const TStrV Cols1,
const TTable Table,
const TStrV Cols2,
const TStr DistanceColName,
const TSimType SimType,
const TFlt Threshold 
)

Performs join if the distance between two rows is less than the specified threshold.

Returns Similarity based join of two tables based on a given distance metric and a given threshold. Records (r1, r2) that are returned satisfy the criterion: d(r1, r2) <= Threshold

Definition at line 1994 of file table.cpp.

1995 {
1996  Assert(Cols1.Len() == Cols2.Len());
1997 
1998  if(Cols1.Len()!=Cols2.Len()){
1999  TExcept::Throw("Column vectors must match in type and length");
2000  }
2001 
2002  for (TInt i = 0; i < Cols1.Len(); i++) {
2003  if(!IsColName(Cols1[i]) || !Table.IsColName(Cols2[i])){
2004  TExcept::Throw("Column not found in Table");
2005  }
2006 
2007  TAttrType Type1 = GetColType(Cols1[i]);
2008  TAttrType Type2 = GetColType(Cols2[i]);
2009 
2010  if(Type1!=Type2){
2011  TExcept::Throw("Column types on the two tables must match.");
2012  }
2013 
2014  // When supporting more distance metrics, check if the types are supported for given metric.
2015  if((Type1!=atInt && Type1!=atFlt) || (Type2!=atInt && Type2!=atFlt)){
2016  TExcept::Throw("Column type not supported. Only Flt and Int column types are supported.");
2017  }
2018  }
2019 
2020  // Initialize Join table and add the similarity column
2021  PTable JointTable = InitializeJointTable(Table);
2022  TFltV DistanceV;
2023 
2024  // O(n^2): Parallelize
2025  for(TRowIterator RowI = this->BegRI(); RowI < this->EndRI(); RowI++) {
2026  for(TRowIterator RowI2 = Table.BegRI(); RowI2 < Table.EndRI(); RowI2++) {
2027  float distance = 0;
2028 
2029  switch(SimType)
2030  {
2031  // Calculate the distance metric
2032  case L2Norm:
2033  for(TInt i = 0; i < Cols1.Len(); i++) {
2034  float attrVal1, attrVal2;
2035  attrVal1 = GetColType(Cols1[i])==atInt ? (float)RowI.GetIntAttr(Cols1[i]) : (float)RowI.GetFltAttr(Cols1[i]);
2036  attrVal2 = Table.GetColType(Cols2[i])==atInt ? (float)RowI2.GetIntAttr(Cols2[i]) : (float)RowI2.GetFltAttr(Cols2[i]);
2037  distance += pow(attrVal1 - attrVal2, 2);
2038  }
2039 
2040  distance = sqrt(distance);
2041 
2042  if(distance<=Threshold){
2043  JointTable->AddJointRow(*this, Table, RowI.GetRowIdx(), RowI2.GetRowIdx());
2044  DistanceV.Add(distance);
2045  }
2046 
2047  // Add row to the joint table if distance <= Threshold
2048  break;
2049  // Haversine distance to calculate the distance between two points on Earth from latitude/longitude
2050  case Haversine:
2051  {
2052  if(Cols1.Len()!=2){
2053  TExcept::Throw("Haversine disance expects exactly two attributes - latitude and longitude - in that order.");
2054  }
2055 
2056  // Block to prevent cross-initialization error from compiler
2057  TFlt Radius = 6373; // km
2058  float Latitude1 = GetColType(Cols1[0])==atInt ? (float)RowI.GetIntAttr(Cols1[0]) : (float)RowI.GetFltAttr(Cols1[0]);
2059  float Latitude2 = Table.GetColType(Cols2[0])==atInt ? (float)RowI2.GetIntAttr(Cols2[0]) : (float)RowI2.GetFltAttr(Cols2[0]);
2060 
2061  float Longitude1 = GetColType(Cols1[1])==atInt ? (float)RowI.GetIntAttr(Cols1[1]) : (float)RowI.GetFltAttr(Cols1[1]);
2062  float Longitude2 = Table.GetColType(Cols2[1])==atInt ? (float)RowI2.GetIntAttr(Cols2[1]) : (float)RowI2.GetFltAttr(Cols2[1]);
2063 
2064  Latitude1 *= static_cast<float>(M_PI/180.0);
2065  Latitude2 *= static_cast<float>(M_PI/180.0);
2066  Longitude1 *= static_cast<float>(M_PI/180.0);
2067  Longitude2 *= static_cast<float>(M_PI/180.0);
2068 
2069  float dlon = Longitude2 - Longitude1;
2070  float dlat = Latitude2 - Latitude1;
2071  float a = pow(sin(dlat/2), 2) + cos(Latitude1)*cos(Latitude2)*pow(sin(dlon/2), 2);
2072  float c = 2*atan2(sqrt(a), sqrt(1-a));
2073  distance = (static_cast<float>(Radius.Val))*c;
2074 
2075  if(distance<=Threshold){
2076  JointTable->AddJointRow(*this, Table, RowI.GetRowIdx(), RowI2.GetRowIdx());
2077  DistanceV.Add(distance);
2078  }
2079  }
2080  break;
2081  case L1Norm:
2082  case Jaccard:
2083  TExcept::Throw("This distance metric is not supported");
2084  }
2085  }
2086  }
2087 
2088  // Add the value for the similarity column
2089  JointTable->StoreFltCol(DistanceColName, DistanceV);
2090  JointTable->InitIds();
2091  return JointTable;
2092 }
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
double Val
Definition: dt.h:1385
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: table.h:149
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Definition: dt.h:1383
Iterator class for TTable rows.
Definition: table.h:330
Definition: table.h:149
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
#define Assert(Cond)
Definition: bd.h:251
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
Definition: table.h:149
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: gbase.h:23
Definition: bd.h:196
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
TVec< PTable > TTable::SpliceByGroup ( const TStrV GroupByAttrs,
TBool  Ordered = true 
)

Splices table into subtables according to a grouping statement.

Definition at line 1808 of file table.cpp.

1808  {
1809  TStrV NGroupBy = NormalizeColNameV(GroupBy);
1810  TIntV UniqueVec;
1812  TVec<PTable> Result;
1813 
1814  Schema NewSchema;
1815  for (TInt c = 0; c < Sch.Len(); c++) {
1816  if (Sch[c].Val1 != GetIdColName()) {
1817  NewSchema.Add(Sch[c]);
1818  }
1819  }
1820 
1821  GroupAux(NGroupBy, Grouping, Ordered, "", false, UniqueVec);
1822 
1823  TInt cnt = 0;
1824  // iterate over groups
1825  for (THash<TGroupKey, TPair<TInt, TIntV> >::TIter it = Grouping.BegI(); it != Grouping.EndI(); it++) {
1826  PTable GroupTable = TTable::New(NewSchema, Context);
1827 
1828  TVec<TPair<TAttrType, TInt> > ColInfo;
1829  TIntV V;
1830  for (TInt i = 0; i < Sch.Len(); i++) {
1831  ColInfo.Add(GroupTable->GetColTypeMap(Sch[i].Val1));
1832  if (Sch[i].Val1 == IdColName()) {
1833  ColInfo[i].Val2 = -1;
1834  }
1835  V.Add(GetColIdx(Sch[i].Val1));
1836  }
1837 
1838  TIntV& Rows = it.GetDat().Val2;
1839 
1840  // iterate over rows in group
1841  for (TInt i = 0; i < Rows.Len(); i++) {
1842  // convert from permanent ID to row ID
1843  TInt RowIdx = RowIdMap.GetDat(Rows[i]);
1844 
1845  // iterate over schema
1846  for (TInt c = 0; c < Sch.Len(); c++) {
1847  TPair<TAttrType, TInt> Info = ColInfo[c];
1848  TInt ColIdx = Info.Val2;
1849 
1850  if (ColIdx == -1) { continue; }
1851 
1852  // add row to new group
1853  switch (Info.Val1) {
1854  case atInt:
1855  GroupTable->IntCols[ColIdx].Add(IntCols[V[c]][RowIdx]);
1856  break;
1857  case atFlt:
1858  GroupTable->FltCols[ColIdx].Add(FltCols[V[c]][RowIdx]);
1859  break;
1860  case atStr:
1861  GroupTable->StrColMaps[ColIdx].Add(StrColMaps[V[c]][RowIdx]);
1862  break;
1863  }
1864 
1865  }
1866  if (GroupTable->LastValidRow >= 0) {
1867  GroupTable->Next[GroupTable->LastValidRow] = GroupTable->NumRows;
1868  }
1869  GroupTable->Next.Add(GroupTable->Last);
1870  GroupTable->LastValidRow = GroupTable->NumRows;
1871 
1872  GroupTable->NumRows++;
1873  GroupTable->NumValidRows++;
1874  }
1875  GroupTable->InitIds();
1876  Result.Add(GroupTable);
1877 
1878  cnt += 1;
1879  }
1880  return Result;
1881 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
Schema Sch
Table Schema.
Definition: table.h:549
TIter BegI() const
Definition: hash.h:213
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TStr IdColName
A mapping from column name to column type and column index among columns of the same type...
Definition: table.h:565
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
TTableContext * Context
Execution Context.
Definition: table.h:545
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TIter EndI() const
Definition: hash.h:218
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
Definition: gbase.h:23
const TVal & GetDat(const TVal &Val) const
Returns reference to the first occurrence of element Val.
Definition: ds.h:838
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TIntIntH RowIdMap
Mapping of permanent row ids to physical id.
Definition: table.h:566
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
Definition: ds.h:32
Definition: hash.h:97
Definition: gbase.h:23
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
Definition: bd.h:196
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
void TTable::StoreFltCol ( const TStr ColName,
const TFltV ColVals 
)

Adds entire flt column to table.

Definition at line 4104 of file table.cpp.

4104  {
4105  if (ColVals.Len() != NumRows) {
4106  printf("new column dimension must agree with number of rows\n");
4107  return;
4108  }
4109  AddSchemaCol(ColName, atFlt);
4110  FltCols.Add(TFltV(NumRows));
4111  TInt ColIdx = FltCols.Len()-1;
4112  TInt i = 0;
4113  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
4114  FltCols[ColIdx][RI.GetRowIdx()] = ColVals[i];
4115  i++;
4116  }
4117  TInt L = FltCols.Len();
4118  AddColType(ColName, atFlt, L-1);
4119 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TVec< TFlt > TFltV
Definition: ds.h:1596
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
Definition: gbase.h:23
void TTable::StoreGroupCol ( const TStr GroupColName,
const TVec< TPair< TInt, TInt > > &  GroupAndRowIds 
)
protected

Parallel helper function for grouping. - we currently don't support such parallel grouping by complex keys.

Stores column for a group. Physical row ids have to be passed.

Definition at line 1310 of file table.cpp.

1310  {
1311  // Add a column where the value of the i'th row is the group id of row i.
1313  TInt L = IntCols.Len();
1314  AddColType(GroupColName, atInt, L-1);
1315  // Store group id for each row.
1316  for (TInt i = 0; i < GroupAndRowIds.Len(); i++) {
1317  IntCols[L-1][GroupAndRowIds[i].Val2] = GroupAndRowIds[i].Val1;
1318  }
1319 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Definition: dt.h:1134
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::StoreIntCol ( const TStr ColName,
const TIntV ColVals 
)

Adds entire int column to table.

Definition at line 4087 of file table.cpp.

4087  {
4088  if (ColVals.Len() != NumRows) {
4089  printf("new column dimension must agree with number of rows\n");
4090  return;
4091  }
4092  AddSchemaCol(ColName, atInt);
4094  TInt ColIdx = IntCols.Len()-1;
4095  TInt i = 0;
4096  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
4097  IntCols[ColIdx][RI.GetRowIdx()] = ColVals[i];
4098  i++;
4099  }
4100  TInt L = IntCols.Len();
4101  AddColType(ColName, atInt, L-1);
4102 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::StoreStrCol ( const TStr ColName,
const TStrV ColVals 
)

Adds entire str column to table.

Definition at line 4121 of file table.cpp.

4121  {
4122  if (ColVals.Len() != NumRows) {
4123  printf("new column dimension must agree with number of rows\n");
4124  return;
4125  }
4126  AddSchemaCol(ColName, atStr);
4128  TInt ColIdx = FltCols.Len()-1;
4129  TInt i = 0;
4130  for (TRowIterator RI = BegRI(); RI < EndRI(); RI++) {
4131  TInt Key = Context->StringVals.GetKeyId(ColVals[i]);
4132  if (Key == -1) { Context->StringVals.AddKey(ColVals[i]); }
4133  StrColMaps[ColIdx][RI.GetRowIdx()] = Key;
4134  i++;
4135  }
4136  TInt L = StrColMaps.Len();
4137  AddColType(ColName, atStr, L-1);
4138 }
void AddSchemaCol(const TStr &ColName, TAttrType ColType)
Adds column with name ColName and type ColType to the schema.
Definition: table.h:642
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Iterator class for TTable rows.
Definition: table.h:330
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
TStrHash< TInt, TBigStrPool > StringVals
StringPool - stores string data values and maps them to integers.
Definition: table.h:182
int AddKey(const char *Key)
Definition: hash.h:968
Definition: dt.h:1134
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
void AddColType(const TStr &ColName, TPair< TAttrType, TInt > ColType)
Adds column with name ColName and type ColType to the ColTypeMap.
Definition: table.h:651
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
int GetKeyId(const char *Key) const
Definition: hash.h:994
static PTable TTable::TableFromHashMap ( const THash< TInt, TInt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)
inlinestatic

Builds table from hash table of int->int.

Definition at line 988 of file table.h.

989  {
990  PTable T = New(H, Col1, Col2, Context, IsStrKeys);
991  T->InitIds();
992  return T;
993  }
Definition: bd.h:196
static PTable New()
Definition: table.h:932
static PTable TTable::TableFromHashMap ( const THash< TInt, TFlt > &  H,
const TStr Col1,
const TStr Col2,
TTableContext Context,
const TBool  IsStrKeys = false 
)
inlinestatic

Builds table from hash table of int->float.

Definition at line 995 of file table.h.

996  {
997  PTable T = New(H, Col1, Col2, Context, IsStrKeys);
998  T->InitIds();
999  return T;
1000  }
Definition: bd.h:196
static PTable New()
Definition: table.h:932
PTable TTable::ThresholdJoin ( const TStr KeyCol1,
const TStr JoinCol1,
const TTable Table,
const TStr KeyCol2,
const TStr JoinCol2,
TInt  Threshold,
TBool  PerJoinKey = false 
)

Definition at line 2644 of file table.cpp.

2645  {
2646  // test input correctness
2647  ThresholdJoinInputCorrectness(KeyCol1, JoinCol1, Table, KeyCol2, JoinCol2);
2648  //printf("verified input correctness\n");
2649  // type of column on which we join (currently support only int)
2650  TAttrType JoinColType = GetColType(JoinCol1);
2651  // type of key column (currently support only int)
2652  TAttrType KeyType = GetColType(KeyCol1);
2653  // Determine which table is smaller
2654  TBool ThisIsSmaller = (NumValidRows <= Table.NumValidRows);
2655  const TTable& TS = ThisIsSmaller ? *this : Table;
2656  const TTable& TB = ThisIsSmaller ? Table : *this;
2657  TStr JoinColS = JoinCol1;
2658  TInt JoinColIdxB = GetColIdx(JoinCol2);
2659  TInt KeyColIdxS = GetColIdx(KeyCol1);
2660  TInt KeyColIdxB = GetColIdx(KeyCol2);
2661  if(!ThisIsSmaller){
2662  JoinColS = JoinCol2;
2663  JoinColIdxB = GetColIdx(JoinCol1);
2664  KeyColIdxS = GetColIdx(KeyCol2);
2665  KeyColIdxB = GetColIdx(KeyCol1);
2666  }
2667 
2668  // debug print
2669  //printf("JoinColS = %d, JoinColIdxB = %d, KeyColIdxS = %d, KeyColIdxB = %d\n",
2670  //GetColIdx(JoinColS).Val, JoinColIdxB.Val, KeyColIdxS.Val, KeyColIdxB.Val);
2671  //printf("starting switch-case\n");
2672 
2673  if(KeyType != atInt && KeyType != atStr){
2674  printf("ThresholdJoin only supports integer or string key attributes\n");
2675  TExcept::Throw("ThresholdJoin only supports integer or string key attributes");
2676  }
2677  if(JoinColType != atInt && JoinColType != atStr){
2678  printf("ThresholdJoin only supports integer or string join attributes\n");
2679  TExcept::Throw("ThresholdJoin only supports integer or string join attributes");
2680  }
2681  //printf("starting the real stuff!\n");
2682  // hash the smaller table T: join col value --> physical row ids of rows with that value
2683  TIntIntVH T;
2684  if(JoinColType == atInt){
2685  TS.GroupByIntCol(JoinColS, T, TIntV(), true);
2686  } else if(JoinColType == atStr){
2687  TS.GroupByStrCol(JoinColS, T, TIntV(), true);
2688  } else{
2689  TExcept::Throw("ThresholdJoin only supports integer or string join attributes");
2690  }
2691 
2692  /*
2693  for(THash<TInt,TIntV>::TIter it = T.BegI(); it < T.EndI(); it++){
2694  if(JoinColType == atStr){
2695  printf("%s -->", Context.StringVals.GetKey(it.GetKey().Val));
2696  } else{
2697  printf("%d -->", it.GetKey().Val);
2698  }
2699  const TIntV& V = it.GetDat();
2700  for(int sr = 0; sr < V.Len(); sr++){
2701  printf(" %d", V[sr].Val);
2702  }
2703  printf("\n");
2704  }
2705  */
2706 
2707  // Counters: (K1,K2) --> (RowIdx1,RowIdx2, count) where K1 is a key from KeyCol1,
2708  // K2 is a key from Table's KeyCol2; RowIdx1 and RowIdx2 are physical row ids
2709  // that participates in a joint tuple that satisfies (1).
2710  // count is the count of joint records that satisfy (1).
2711  // In case of string attributes - the integer mappings of the key attribute values are used.
2712  if(PerJoinKey){
2713  //printf("PerJoinKey\n");
2714  THash<TIntTr,TIntTr> Counters;
2715  ThresholdJoinCountPerJoinKeyCollisions(TB, TS, T, JoinColIdxB, KeyColIdxB, KeyColIdxS, Counters, ThisIsSmaller, JoinColType, KeyType);
2716  /*
2717  for(THash<TIntTr,TIntTr>::TIter it = Counters.BegI(); it < Counters.EndI(); it++){
2718  const TIntTr& K = it.GetKey();
2719  const TIntTr& V = it.GetDat();
2720  if(KeyType == atStr){
2721  printf("%s %s --> %d %d %d\n", Context->StringVals.GetKey(K.Val1), Context->StringVals.GetKey(K.Val2), V.Val1.Val, V.Val2.Val, V.Val3.Val);
2722  } else{
2723  printf("%d %d --> %d %d %d\n", K.Val1.Val, K.Val2.Val, V.Val1.Val, V.Val2.Val, V.Val3.Val);
2724  }
2725  }
2726  */
2727  //printf("found collisions\n");
2728  return ThresholdJoinPerJoinKeyOutputTable(Counters, Threshold, Table);
2729  } else{
2730  //printf("not PerJoinKey\n");
2731  THash<TIntPr,TIntTr> Counters;
2732  ThresholdJoinCountCollisions(TB, TS, T, JoinColIdxB, KeyColIdxB, KeyColIdxS, Counters, ThisIsSmaller, JoinColType, KeyType);
2733  /*
2734  for(THash<TIntPr,TIntTr>::TIter it = Counters.BegI(); it < Counters.EndI(); it++){
2735  const TIntPr& K = it.GetKey();
2736  const TIntTr& V = it.GetDat();
2737  if(KeyType == atStr){
2738  printf("%s %s --> %d %d %d\n", Context->StringVals.GetKey(K.Val1), Context->StringVals.GetKey(K.Val2), V.Val1.Val, V.Val2.Val, V.Val3.Val);
2739  } else{
2740  printf("%d %d --> %d %d %d\n", K.Val1.Val, K.Val2.Val, V.Val1.Val, V.Val2.Val, V.Val3.Val);
2741  }
2742  }
2743  */
2744  //printf("found collisions\n");
2745  return ThresholdJoinOutputTable(Counters, Threshold, Table);
2746  }
2747 }
void ThresholdJoinInputCorrectness(const TStr &KeyCol1, const TStr &JoinCol1, const TTable &Table, const TStr &KeyCol2, const TStr &JoinCol2)
Definition: table.cpp:2478
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void ThresholdJoinCountCollisions(const TTable &TB, const TTable &TS, const TIntIntVH &T, TInt JoinColIdxB, TInt KeyColIdxB, TInt KeyColIdxS, THash< TIntPr, TIntTr > &Counters, TBool ThisIsSmaller, TAttrType JoinColType, TAttrType KeyType)
Definition: table.cpp:2506
void ThresholdJoinCountPerJoinKeyCollisions(const TTable &TB, const TTable &TS, const TIntIntVH &T, TInt JoinColIdxB, TInt KeyColIdxB, TInt KeyColIdxS, THash< TIntTr, TIntTr > &Counters, TBool ThisIsSmaller, TAttrType JoinColType, TAttrType KeyType)
Definition: table.cpp:2557
Definition: gbase.h:23
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void GroupByIntCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values.
Definition: table.h:1598
Definition: dt.h:1134
Definition: dt.h:412
void GroupByStrCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with string values. Returns hash table with grouping.
Definition: table.h:1653
Table class: Relational table with columnar data storage.
Definition: table.h:484
PTable ThresholdJoinPerJoinKeyOutputTable(const THash< TIntTr, TIntTr > &Counters, TInt Threshold, const TTable &Table)
Definition: table.cpp:2622
TVec< TInt > TIntV
Definition: ds.h:1594
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
PTable ThresholdJoinOutputTable(const THash< TIntPr, TIntTr > &Counters, TInt Threshold, const TTable &Table)
Definition: table.cpp:2608
Definition: gbase.h:23
Definition: dt.h:971
void TTable::ThresholdJoinCountCollisions ( const TTable TB,
const TTable TS,
const TIntIntVH T,
TInt  JoinColIdxB,
TInt  KeyColIdxB,
TInt  KeyColIdxS,
THash< TIntPr, TIntTr > &  Counters,
TBool  ThisIsSmaller,
TAttrType  JoinColType,
TAttrType  KeyType 
)
protected

Definition at line 2506 of file table.cpp.

2508  {
2509  // iterate over big table and count / record joint tuples
2510  for (TRowIterator RowI = TB.BegRI(); RowI < TB.EndRI(); RowI++) {
2511  // value to join on from big table
2512  TInt JVal = 0;
2513  if(JoinColType == atStr){
2514  JVal = RowI.GetStrMapById(JoinColIdxB);
2515  } else{
2516  JVal = RowI.GetIntAttr(JoinColIdxB);
2517  }
2518  //printf("JVal: %d\n", JVal.Val);
2519  if(T.IsKey(JVal)){
2520  // read key attribute of big table row
2521  TInt KeyB = 0;
2522  if(KeyType == atStr){
2523  KeyB = RowI.GetStrMapById(KeyColIdxB);
2524  } else{
2525  KeyB = RowI.GetIntAttr(KeyColIdxB);
2526  }
2527  // read row ids from small table with join attribute value of JVal
2528  const TIntV& RelevantRows = T.GetDat(JVal);
2529  for(int i = 0; i < RelevantRows.Len(); i++){
2530  // read key attribute of relevant row from small table
2531  TInt KeyS = 0;
2532  if(KeyType == atStr){
2533  KeyS = TS.StrColMaps[KeyColIdxS][RelevantRows[i]];
2534  } else{
2535  KeyS = TS.IntCols[KeyColIdxS][RelevantRows[i]];
2536  }
2537  // create a pair of keys - serves as a key in Counters
2538  TIntPr Keys = ThisIsSmaller ? TIntPr(KeyS, KeyB) : TIntPr(KeyB, KeyS);
2539  if(Counters.IsKey(Keys)){
2540  // if the key pair has been seen before - increment its counter by 1
2541  TIntTr& V = Counters.GetDat(Keys);
2542  V.Val3 = V.Val3 + 1;
2543  } else{
2544  // if the key pair hasn't been seen before - add it with value of
2545  // row indices that create a joint record with this key pair
2546  if(ThisIsSmaller){
2547  Counters.AddDat(Keys, TIntTr(RelevantRows[i], RowI.GetRowIdx(),1));
2548  } else{
2549  Counters.AddDat(Keys, TIntTr(RowI.GetRowIdx(), RelevantRows[i],1));
2550  }
2551  }
2552  } // end of for loop
2553  } // end of if statement
2554  } // end of for loop
2555 }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
Definition: ds.h:130
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Iterator class for TTable rows.
Definition: table.h:330
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TTriple< TInt, TInt, TInt > TIntTr
Definition: ds.h:171
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
TVal3 Val3
Definition: ds.h:134
void TTable::ThresholdJoinCountPerJoinKeyCollisions ( const TTable TB,
const TTable TS,
const TIntIntVH T,
TInt  JoinColIdxB,
TInt  KeyColIdxB,
TInt  KeyColIdxS,
THash< TIntTr, TIntTr > &  Counters,
TBool  ThisIsSmaller,
TAttrType  JoinColType,
TAttrType  KeyType 
)
protected

Definition at line 2557 of file table.cpp.

2559  {
2560  for (TRowIterator RowI = TB.BegRI(); RowI < TB.EndRI(); RowI++) {
2561  // value to join on from big table
2562  TInt JVal = 0;
2563  if(JoinColType == atStr){
2564  JVal = RowI.GetStrMapById(JoinColIdxB);
2565  } else{
2566  JVal = RowI.GetIntAttr(JoinColIdxB);
2567  }
2568  //printf("JVal: %d\n", JVal.Val);
2569  if(T.IsKey(JVal)){
2570  // read key attribute of big table row
2571  TInt KeyB = 0;
2572  if(KeyType == atStr){
2573  KeyB = RowI.GetStrMapById(KeyColIdxB);
2574  } else{
2575  KeyB = RowI.GetIntAttr(KeyColIdxB);
2576  }
2577  // read row ids from small table with join attribute value of JVal
2578  const TIntV& RelevantRows = T.GetDat(JVal);
2579  for(int i = 0; i < RelevantRows.Len(); i++){
2580  // read key attribute of relevant row from small table
2581  TInt KeyS = 0;
2582  if(KeyType == atStr){
2583  KeyS = TS.StrColMaps[KeyColIdxS][RelevantRows[i]];
2584  } else{
2585  KeyS = TS.IntCols[KeyColIdxS][RelevantRows[i]];
2586  }
2587  // create a pair of keys - serves as a key in Counters
2588  TIntPr Keys = ThisIsSmaller ? TIntPr(KeyS, KeyB) : TIntPr(KeyB, KeyS);
2589  TIntTr K(Keys.Val1,Keys.Val2,JVal);
2590  if(Counters.IsKey(K)){
2591  // if the key pair has been seen before - increment its counter by 1
2592  TIntTr& V = Counters.GetDat(K);
2593  V.Val3 = V.Val3 + 1;
2594  } else{
2595  // if the key pair hasn't been seen before - add it with value of
2596  // row indices that create a joint record with this key pair
2597  if(ThisIsSmaller){
2598  Counters.AddDat(K, TIntTr(RelevantRows[i], RowI.GetRowIdx(),1));
2599  } else{
2600  Counters.AddDat(K, TIntTr(RowI.GetRowIdx(), RelevantRows[i],1));
2601  }
2602  }
2603  } // end of for loop
2604  } // end of if statement
2605  } // end of for loop
2606  }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
Definition: ds.h:130
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
TVec< TIntV > IntCols
Next[i] is the successor of row i. Table iterators follow the order dictated by Next ...
Definition: table.h:558
Iterator class for TTable rows.
Definition: table.h:330
TVec< TIntV > StrColMaps
Data columns of integer mappings of string attributes.
Definition: table.h:560
Definition: dt.h:1134
Definition: ds.h:32
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
TTriple< TInt, TInt, TInt > TIntTr
Definition: ds.h:171
Definition: gbase.h:23
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
TVal3 Val3
Definition: ds.h:134
void TTable::ThresholdJoinInputCorrectness ( const TStr KeyCol1,
const TStr JoinCol1,
const TTable Table,
const TStr KeyCol2,
const TStr JoinCol2 
)
protected

Definition at line 2478 of file table.cpp.

2479  {
2480  if (!IsColName(KeyCol1)) {
2481  printf("no such column %s\n", KeyCol1.CStr());
2482  TExcept::Throw("no such column " + KeyCol1);
2483  }
2484  if (!Table.IsColName(KeyCol2)) {
2485  printf("no such column %s\n", KeyCol2.CStr());
2486  TExcept::Throw("no such column " + KeyCol2);
2487  }
2488  if (!IsColName(JoinCol1)) {
2489  printf("no such column %s\n", JoinCol1.CStr());
2490  TExcept::Throw("no such column " + JoinCol1);
2491  }
2492  if (!Table.IsColName(JoinCol2)) {
2493  printf("no such column %s\n", JoinCol2.CStr());
2494  TExcept::Throw("no such column " + JoinCol2);
2495  }
2496  if (GetColType(JoinCol1) != Table.GetColType(JoinCol2)) {
2497  printf("Trying to Join on columns of different type\n");
2498  TExcept::Throw("Trying to Join on columns of different type");
2499  }
2500  if (GetColType(KeyCol1) != Table.GetColType(KeyCol2)) {
2501  printf("Key type mismatch\n");
2502  TExcept::Throw("Key type mismatch");
2503  }
2504 }
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
char * CStr()
Definition: dt.h:476
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
PTable TTable::ThresholdJoinOutputTable ( const THash< TIntPr, TIntTr > &  Counters,
TInt  Threshold,
const TTable Table 
)
protected

Definition at line 2608 of file table.cpp.

2608  {
2609  // initialize result table
2610  PTable JointTable = InitializeJointTable(Table);
2611  for(THash<TIntPr,TIntTr>::TIter iter = Counters.BegI(); iter < Counters.EndI(); iter++){
2612  TIntTr& Counter = iter.GetDat();
2613  //printf("keys: %d, %d\n", iter.GetKey().Val1.Val, iter.GetKey().Val2.Val);
2614  //printf("selected rows: %d,%d, counter: %d\n", Counter.Val1.Val, Counter.Val2.Val, Counter.Val3.Val);
2615  if(Counter.Val3 >= Threshold){
2616  JointTable->AddJointRow(*this, Table, Counter.Val1, Counter.Val2);
2617  }
2618  }
2619  return JointTable;
2620 }
Definition: ds.h:130
TIter BegI() const
Definition: hash.h:213
TVal1 Val1
Definition: ds.h:132
TIter EndI() const
Definition: hash.h:218
TVal2 Val2
Definition: ds.h:133
Definition: bd.h:196
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
TVal3 Val3
Definition: ds.h:134
PTable TTable::ThresholdJoinPerJoinKeyOutputTable ( const THash< TIntTr, TIntTr > &  Counters,
TInt  Threshold,
const TTable Table 
)
protected

Definition at line 2622 of file table.cpp.

2622  {
2623  PTable JointTable = InitializeJointTable(Table);
2624  for(THash<TIntTr,TIntTr>::TIter iter = Counters.BegI(); iter < Counters.EndI(); iter++){
2625  const TIntTr& Counter = iter.GetDat();
2626  const TIntTr& Keys = iter.GetKey();
2627  THashSet<TIntPr> Pairs;
2628  if(Counter.Val3 >= Threshold){
2629  TIntPr K(Keys.Val1,Keys.Val2);
2630  if(!Pairs.IsKey(K)){
2631  Pairs.AddKey(K);
2632  JointTable->AddJointRow(*this, Table, Counter.Val1, Counter.Val2);
2633  }
2634  }
2635  }
2636  return JointTable;
2637 }
Definition: ds.h:130
TIter BegI() const
Definition: hash.h:213
TVal1 Val1
Definition: ds.h:132
TIter EndI() const
Definition: hash.h:218
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
TVal2 Val2
Definition: ds.h:133
int AddKey(const TKey &Key)
Definition: shash.h:1254
Definition: ds.h:32
Definition: bd.h:196
PTable InitializeJointTable(const TTable &Table)
Initializes an empty table for the join of this table with the given table.
Definition: table.cpp:1916
TVal3 Val3
Definition: ds.h:134
TVec< PNEANet > TTable::ToGraphPerGroup ( TStr  GroupAttr,
TAttrAggr  AggrPolicy 
)

Creates a sequence of graphs based on grouping specified by GroupAttr.

Definition at line 3662 of file table.cpp.

3662  {
3663  return ToGraphSequence(GroupAttr, AggrPolicy, TInt(1), TInt(1), TInt::Mn, TInt::Mx);
3664 }
static const int Mx
Definition: dt.h:1139
TVec< PNEANet > ToGraphSequence(TStr SplitAttr, TAttrAggr AggrPolicy, TInt WindowSize, TInt JumpSize, TInt StartVal=TInt::Mn, TInt EndVal=TInt::Mx)
Creates a sequence of graphs based on values of column SplitAttr and windows specified by JumpSize an...
Definition: table.cpp:3651
static const int Mn
Definition: dt.h:1138
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
Definition: dt.h:1134
PNEANet TTable::ToGraphPerGroupIterator ( TStr  GroupAttr,
TAttrAggr  AggrPolicy 
)

Creates the graph sequence one at a time.

Create the graph sequence one at a time, to allow efficient use of memory. A call to this function must be followed by subsequent calls to NextGraphIterator().

Definition at line 3676 of file table.cpp.

3676  {
3677  return ToGraphSequenceIterator(GroupAttr, AggrPolicy, TInt(1), TInt(1), TInt::Mn, TInt::Mx);
3678 }
static const int Mx
Definition: dt.h:1139
static const int Mn
Definition: dt.h:1138
PNEANet ToGraphSequenceIterator(TStr SplitAttr, TAttrAggr AggrPolicy, TInt WindowSize, TInt JumpSize, TInt StartVal=TInt::Mn, TInt EndVal=TInt::Mx)
Creates the graph sequence one at a time.
Definition: table.cpp:3666
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
Definition: dt.h:1134
TVec< PNEANet > TTable::ToGraphSequence ( TStr  SplitAttr,
TAttrAggr  AggrPolicy,
TInt  WindowSize,
TInt  JumpSize,
TInt  StartVal = TInt::Mn,
TInt  EndVal = TInt::Mx 
)

Creates a sequence of graphs based on values of column SplitAttr and windows specified by JumpSize and WindowSize.

Definition at line 3651 of file table.cpp.

3651  {
3652  FillBucketsByWindow(SplitAttr, JumpSize, WindowSize, StartVal, EndVal);
3653  printf("buckets filled\n");
3655 }
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
void FillBucketsByWindow(TStr SplitAttr, TInt JumpSize, TInt WindowSize, TInt StartVal, TInt EndVal)
Fills RowIdBuckets with sets of row ids.
Definition: table.cpp:3547
TVec< PNEANet > GetGraphsFromSequence(TAttrAggr AggrPolicy)
Returns a sequence of graphs.
Definition: table.cpp:3616
PNEANet TTable::ToGraphSequenceIterator ( TStr  SplitAttr,
TAttrAggr  AggrPolicy,
TInt  WindowSize,
TInt  JumpSize,
TInt  StartVal = TInt::Mn,
TInt  EndVal = TInt::Mx 
)

Creates the graph sequence one at a time.

Create the graph sequence one at a time, to allow efficient use of memory. A call to this function must be followed by subsequent calls to NextGraphIterator().

Definition at line 3666 of file table.cpp.

3666  {
3667  FillBucketsByWindow(SplitAttr, JumpSize, WindowSize, StartVal, EndVal);
3669 }
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
void FillBucketsByWindow(TStr SplitAttr, TInt JumpSize, TInt WindowSize, TInt StartVal, TInt EndVal)
Fills RowIdBuckets with sets of row ids.
Definition: table.cpp:3547
PNEANet GetFirstGraphFromSequence(TAttrAggr AggrPolicy)
Returns the first graph of the sequence.
Definition: table.cpp:3628
TVec< PNEANet > TTable::ToVarGraphSequence ( TStr  SplitAttr,
TAttrAggr  AggrPolicy,
TIntPrV  SplitIntervals 
)

Creates a sequence of graphs based on values of column SplitAttr and intervals specified by SplitIntervals.

Definition at line 3657 of file table.cpp.

3657  {
3658  FillBucketsByInterval(SplitAttr, SplitIntervals);
3660 }
void FillBucketsByInterval(TStr SplitAttr, TIntPrV SplitIntervals)
Fills RowIdBuckets with sets of row ids.
Definition: table.cpp:3599
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
TVec< PNEANet > GetGraphsFromSequence(TAttrAggr AggrPolicy)
Returns a sequence of graphs.
Definition: table.cpp:3616
PNEANet TTable::ToVarGraphSequenceIterator ( TStr  SplitAttr,
TAttrAggr  AggrPolicy,
TIntPrV  SplitIntervals 
)

Creates the graph sequence one at a time.

Create the graph sequence one at a time, to allow efficient use of memory. A call to this function must be followed by subsequent calls to NextGraphIterator().

Definition at line 3671 of file table.cpp.

3671  {
3672  FillBucketsByInterval(SplitAttr, SplitIntervals);
3674 }
void FillBucketsByInterval(TStr SplitAttr, TIntPrV SplitIntervals)
Fills RowIdBuckets with sets of row ids.
Definition: table.cpp:3599
TAttrAggr AggrPolicy
Aggregation policy used for solving conflicts between different values of an attribute of the same no...
Definition: table.h:601
PNEANet GetFirstGraphFromSequence(TAttrAggr AggrPolicy)
Returns the first graph of the sequence.
Definition: table.cpp:3628
PTable TTable::Union ( const TTable Table)

Returns union of this table with given Table.

Definition at line 4531 of file table.cpp.

4531  {
4532  Schema NewSchema;
4533  THashSet<TInt> Collisions;
4534  TStrV ColNames;
4535 
4536  for (TInt c = 0; c < Sch.Len(); c++) {
4537  if (Sch[c].Val1 != GetIdColName()) {
4538  NewSchema.Add(TPair<TStr,TAttrType>(Sch[c].Val1, Sch[c].Val2));
4539  ColNames.Add(Sch[c].Val1);
4540  }
4541  }
4542  PTable result = TTable::New(NewSchema, Context);
4543 
4544  GetCollidingRows(Table, Collisions);
4545 
4546  result->AddTable(*this);
4547 
4548  result->Unique(ColNames);
4549 
4550  // this part should be made faster by adding all the rows in one go
4551  for (TRowIterator it = Table.BegRI(); it < Table.EndRI(); it++) {
4552  if (!Collisions.IsKey(it.GetRowIdx())) {
4553  result->AddRow(it);
4554  }
4555  }
4556 
4557  // printf("this: %d %d, table: %d %d, result: %d %d\n",
4558  // this->GetNumRows().Val, this->GetNumValidRows().Val,
4559  // Table.GetNumRows().Val, Table.GetNumValidRows().Val,
4560  // result->GetNumRows().Val, result->GetNumValidRows().Val);
4561 
4562  result->InitIds();
4563  return result;
4564 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
bool IsKey(const TKey &Key) const
Definition: shash.h:1148
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
Iterator class for TTable rows.
Definition: table.h:330
void GetCollidingRows(const TTable &T, THashSet< TInt > &Collisions)
Gets set of row ids of rows common with table T.
Definition: table.cpp:4014
Definition: dt.h:1134
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
PTable TTable::Union ( const PTable Table)
inline

Definition at line 1413 of file table.h.

1413 { return Union(*Table); };
PTable Union(const TTable &Table)
Returns union of this table with given Table.
Definition: table.cpp:4531
PTable TTable::UnionAll ( const TTable Table)

Returns union of this table with given Table, preserving duplicates.

Definition at line 4511 of file table.cpp.

4511  {
4512  Schema NewSchema;
4513  for (TInt c = 0; c < Sch.Len(); c++) {
4514  if (Sch[c].Val1 != GetIdColName()) {
4515  NewSchema.Add(TPair<TStr,TAttrType>(Sch[c].Val1, Sch[c].Val2));
4516  }
4517  }
4518  PTable result = TTable::New(NewSchema, Context);
4519  result->AddTable(*this);
4520  result->UnionAllInPlace(Table);
4521  return result;
4522 }
Schema Sch
Table Schema.
Definition: table.h:549
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
TTableContext * Context
Execution Context.
Definition: table.h:545
TStr GetIdColName() const
Gets name of the id column of this table.
Definition: table.h:636
Definition: dt.h:1134
Definition: bd.h:196
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
static PTable New()
Definition: table.h:932
PTable TTable::UnionAll ( const PTable Table)
inline

Definition at line 1416 of file table.h.

1416 { return UnionAll(*Table); };
PTable UnionAll(const TTable &Table)
Returns union of this table with given Table, preserving duplicates.
Definition: table.cpp:4511
void TTable::UnionAllInPlace ( const TTable Table)

Same as TTable::ConcatTable.

Definition at line 4524 of file table.cpp.

4524  {
4525  AddTable(Table);
4526  // TODO: For the moment, IDs are not initialized (to avoid having too many ID columns)
4527  //result->InitIds();
4528 }
void AddTable(const TTable &T)
Adds all the rows of the input table. Allows duplicate rows (not a union).
Definition: table.cpp:3975
void TTable::UnionAllInPlace ( const PTable Table)
inline

Definition at line 1419 of file table.h.

1419 { return UnionAllInPlace(*Table); };
void UnionAllInPlace(const TTable &Table)
Same as TTable::ConcatTable.
Definition: table.cpp:4524
void TTable::Unique ( const TStr Col)

Removes rows with duplicate values in given column.

Definition at line 1266 of file table.cpp.

1266  {
1267  TIntV RemainingRows;
1268  TStr NCol = NormalizeColName(Col);
1269  switch (GetColType(NCol)) {
1270  case atInt: {
1271  TIntIntVH Grouping;
1272  GroupByIntCol(NCol, Grouping, TIntV(), true, true);
1273  for (TIntIntVH::TIter it = Grouping.BegI(); it < Grouping.EndI(); it++) {
1274  RemainingRows.Add(it->Dat[0]);
1275  }
1276  break;
1277  }
1278  case atFlt: {
1279  THash<TFlt,TIntV> Grouping;
1280  GroupByFltCol(NCol, Grouping, TIntV(), true, true);
1281  for (THash<TFlt,TIntV>::TIter it = Grouping.BegI(); it < Grouping.EndI(); it++) {
1282  RemainingRows.Add(it->Dat[0]);
1283  }
1284  break;
1285  }
1286  case atStr: {
1287  TIntIntVH Grouping;
1288  GroupByStrCol(NCol, Grouping, TIntV(), true, true);
1289  for (TIntIntVH::TIter it = Grouping.BegI(); it < Grouping.EndI(); it++) {
1290  RemainingRows.Add(it->Dat[0]);
1291  }
1292  break;
1293  }
1294  }
1295  KeepSortedRows(RemainingRows);
1296 }
TIter BegI() const
Definition: hash.h:213
TIter EndI() const
Definition: hash.h:218
void GroupByFltCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with float values. Returns hash table with grouping.
Definition: table.h:1626
Definition: gbase.h:23
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void GroupByIntCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values.
Definition: table.h:1598
void KeepSortedRows(const TIntV &KeepV)
Removes all rows that are not mentioned in the SORTED vector KeepV.
Definition: table.cpp:1152
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
Definition: dt.h:412
void GroupByStrCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with string values. Returns hash table with grouping.
Definition: table.h:1653
Definition: gbase.h:23
TVec< TInt > TIntV
Definition: ds.h:1594
Definition: gbase.h:23
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
void TTable::Unique ( const TStrV Cols,
TBool  Ordered = true 
)

Removes rows with duplicate values in given columns.

Definition at line 1298 of file table.cpp.

1298  {
1299  if(Cols.Len() == 1){
1300  Unique(Cols[0]);
1301  return;
1302  }
1303  TStrV NCols = NormalizeColNameV(Cols);
1305  TIntV UniqueVec;
1306  GroupAux(NCols, Grouping, Ordered, "", true, UniqueVec, true);
1307  KeepSortedRows(UniqueVec);
1308 }
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static TStrV NormalizeColNameV(const TStrV &Cols)
Adds suffix to column name if it doesn't exist.
Definition: table.h:539
void GroupAux(const TStrV &GroupBy, THash< TGroupKey, TPair< TInt, TIntV > > &Grouping, TBool Ordered, const TStr &GroupColName, TBool KeepUnique, TIntV &UniqueVec, TBool UsePhysicalIds=true)
Helper function for grouping.
Definition: table.cpp:1322
void KeepSortedRows(const TIntV &KeepV)
Removes all rows that are not mentioned in the SORTED vector KeepV.
Definition: table.cpp:1152
void Unique(const TStr &Col)
Removes rows with duplicate values in given column.
Definition: table.cpp:1266
Definition: hash.h:97
void TTable::UpdateFltFromTable ( const TStr KeyAttr,
const TStr UpdateAttr,
const TTable Table,
const TStr FKeyAttr,
const TStr ReadAttr,
TFlt  DefaultFltVal = 0.0 
)

Definition at line 4242 of file table.cpp.

4243  {
4244  if(!IsColName(KeyAttr)){ TExcept::Throw("Bad KeyAttr parameter");}
4245  if(!IsColName(UpdateAttr)){ TExcept::Throw("Bad UpdateAttr parameter");}
4246  if(!Table.IsColName(FKeyAttr)){ TExcept::Throw("Bad FKeyAttr parameter");}
4247  if(!Table.IsColName(ReadAttr)){ TExcept::Throw("Bad ReadAttr parameter");}
4248 
4249 #ifdef GCC_ATOMIC
4250  if(GetMP()){
4251  UpdateFltFromTableMP(KeyAttr, UpdateAttr,Table, FKeyAttr, ReadAttr, DefaultFltVal);
4252  return;
4253  }
4254 #endif // GCC_ATOMIC
4255 
4256  TAttrType KeyType = GetColType(KeyAttr);
4257  TAttrType FKeyType = Table.GetColType(FKeyAttr);
4258  if(KeyType != FKeyType){TExcept::Throw("Key Type Mismatch");}
4259  if(GetColType(UpdateAttr) != atFlt || Table.GetColType(ReadAttr) != atFlt){
4260  TExcept::Throw("Expecting Float values");
4261  }
4262  TStr NKeyAttr = NormalizeColName(KeyAttr);
4263  TStr NUpdateAttr = NormalizeColName(UpdateAttr);
4264  TStr NFKeyAttr = Table.NormalizeColName(FKeyAttr);
4265  TStr NReadAttr = Table.NormalizeColName(ReadAttr);
4266  TInt UpdateColIdx = GetColIdx(UpdateAttr);
4267 
4268  for(TRowIterator iter = BegRI(); iter < EndRI(); iter++){
4269  FltCols[UpdateColIdx][iter.GetRowIdx()] = DefaultFltVal;
4270  }
4271 
4272  switch(KeyType) {
4273  // TODO: add support for other cases of KeyType
4274  case atInt: {
4275  TIntIntVH Grouping;
4276  GroupByIntCol(NKeyAttr, Grouping, TIntV(), true, true);
4277  for (TRowIterator RI = Table.BegRI(); RI < Table.EndRI(); RI++) {
4278  TInt K = RI.GetIntAttr(NFKeyAttr);
4279  if (Grouping.IsKey(K)) {
4280  TIntV& UpdateRows = Grouping.GetDat(K);
4281  for (int i = 0; i < UpdateRows.Len(); i++) {
4282  FltCols[UpdateColIdx][UpdateRows[i]] = RI.GetFltAttr(NReadAttr);
4283  } // end of for loop
4284  } // end of if statement
4285  } // end of for loop
4286  } // end of case atInt
4287  break;
4288  default:
4289  break;
4290  } // end of outer switch statement
4291 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static TInt GetMP()
Definition: table.h:527
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
TRowIterator BegRI() const
Gets iterator to the first valid row of the table.
Definition: table.h:1241
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
void GroupByIntCol(const TStr &GroupBy, T &Grouping, const TIntV &IndexSet, TBool All, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values.
Definition: table.h:1598
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TRowIterator EndRI() const
Gets iterator to the last valid row of the table.
Definition: table.h:1243
Definition: dt.h:412
Definition: gbase.h:23
void UpdateFltFromTableMP(const TStr &KeyAttr, const TStr &UpdateAttr, const TTable &Table, const TStr &FKeyAttr, const TStr &ReadAttr, TFlt DefaultFltVal=0.0)
Definition: table.cpp:4174
TVec< TInt > TIntV
Definition: ds.h:1594
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TBool IsColName(const TStr &ColName) const
Definition: table.h:646
void TTable::UpdateFltFromTableMP ( const TStr KeyAttr,
const TStr UpdateAttr,
const TTable Table,
const TStr FKeyAttr,
const TStr ReadAttr,
TFlt  DefaultFltVal = 0.0 
)

Definition at line 4174 of file table.cpp.

4176  {
4177  if (!GetMP()) {
4178  TExcept::Throw("Not Using MP!");
4179  }
4180 
4181  TAttrType KeyType = GetColType(KeyAttr);
4182  TAttrType FKeyType = Table.GetColType(FKeyAttr);
4183  if(KeyType != FKeyType){TExcept::Throw("Key Type Mismatch");}
4184  if(GetColType(UpdateAttr) != atFlt || Table.GetColType(ReadAttr) != atFlt){
4185  TExcept::Throw("Expecting Float values");
4186  }
4187  TStr NKeyAttr = NormalizeColName(KeyAttr);
4188  //TStr NUpdateAttr = NormalizeColName(UpdateAttr);
4189  //TStr NFKeyAttr = Table.NormalizeColName(FKeyAttr);
4190  //TStr NReadAttr = Table.NormalizeColName(ReadAttr);
4191  TInt UpdateColIdx = GetColIdx(UpdateAttr);
4192  TInt FKeyColIdx = GetColIdx(FKeyAttr);
4193  TInt ReadColIdx = GetColIdx(ReadAttr);
4194 
4195  // TODO: this should be a generic vector operation
4196  SetFltColToConstMP(UpdateColIdx, DefaultFltVal);
4197 
4198  TIntPrV Partitions;
4199  Table.GetPartitionRanges(Partitions, omp_get_max_threads()*CHUNKS_PER_THREAD);
4200  TInt PartitionSize = Partitions[0].GetVal2()-Partitions[0].GetVal1()+1;
4201  TIntV Locks(NumRows);
4202  Locks.PutAll(0); // need to parallelize this...
4203 
4204  switch (KeyType) {
4205  // TODO: add support for other cases of KeyType
4206  case atInt: {
4207  THashMP<TInt,TIntV> Grouping;
4208  // must use physical row ids
4209  GroupByIntColMP(NKeyAttr, Grouping, true);
4210  #pragma omp parallel for schedule(dynamic, CHUNKS_PER_THREAD) // num_threads(1)
4211  for (int i = 0; i < Partitions.Len(); i++) {
4212  TRowIterator RowI(Partitions[i].GetVal1(), &Table);
4213  TRowIterator EndI(Partitions[i].GetVal2(), &Table);
4214  while (RowI < EndI) {
4215  TInt K = RowI.GetIntAttr(FKeyColIdx);
4216  if (Grouping.IsKey(K)) {
4217  TIntV& UpdateRows = Grouping.GetDat(K);
4218  for (int j = 0; j < UpdateRows.Len(); j++) {
4219  int* lock = &Locks[UpdateRows[j]].Val;
4220  // OP RS 2016/06/30: needed to define a wrapper function
4221  // for the code to compile on Mac OS X gcc 4.2.1
4222  //if (!__sync_bool_compare_and_swap(lock, 0, 1)) {
4223  if (!sync_bool_compare_and_swap(lock)) {
4224  continue;
4225  }
4226  //printf("key = %d, row = %d, old_score = %f\n", K.Val, j, UpdateRows[j].Val, FltCols[UpdateColIdx][UpdateRows[j]].Val);
4227  FltCols[UpdateColIdx][UpdateRows[j]] = RowI.GetFltAttr(ReadColIdx);
4228  //printf("key = %d, new_score = %f\n", K.Val, j, FltCols[UpdateColIdx][UpdateRows[j]].Val);
4229  } // end of for loop
4230  } // end of if statement
4231  RowI++;
4232  } // end of while loop
4233  } // end of for loop
4234  } // end of case atInt
4235  break;
4236  default:
4237  break;
4238  } // end of outer switch statement
4239 }
TInt GetColIdx(const TStr &ColName) const
Gets index of column ColName among columns of the same type in the schema.
Definition: table.h:1013
enum TAttrType_ TAttrType
Types for tables, sparse and dense attributes.
void GetPartitionRanges(TIntPrV &Partitions, TInt NumPartitions) const
Partitions the table into NumPartitions and populate Partitions with the ranges.
Definition: table.cpp:1177
void GroupByIntColMP(const TStr &GroupBy, THashMP< TInt, TIntV > &Grouping, TBool UsePhysicalIds=true) const
Groups/hashes by a single column with integer values, using OpenMP multi-threading.
Definition: table.cpp:1225
void SetFltColToConstMP(TInt UpdateColIdx, TFlt DefaultFltVal)
Definition: table.cpp:4152
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
static TInt GetMP()
Definition: table.h:527
Definition: gbase.h:23
Iterator class for TTable rows.
Definition: table.h:330
static void Throw(const TStr &MsgStr)
Definition: ut.h:187
TAttrType GetColType(const TStr &ColName) const
Gets type of column ColName.
Definition: table.h:1227
int sync_bool_compare_and_swap(int *lock)
Definition: table.cpp:4170
bool IsKey(const TKey &Key) const
Definition: hashmp.h:191
Definition: dt.h:1134
static TStr NormalizeColName(const TStr &ColName)
Adds suffix to column name if it doesn't exist.
Definition: table.h:530
TVec< TFltV > FltCols
Data columns of floating point attributes.
Definition: table.h:559
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
Definition: dt.h:412
Definition: gbase.h:23
Hash-Table with multiprocessing support.
Definition: hashmp.h:81
const TDat & GetDat(const TKey &Key) const
Definition: hashmp.h:195
template<class T >
void TTable::UpdateGrouping ( THash< T, TIntV > &  Grouping,
Key,
TInt  Val 
) const
protected

Template for utility function to update a grouping hash map.

Definition at line 1680 of file table.h.

1680  {
1681  if (Grouping.IsKey(Key)) {
1682  Grouping.GetDat(Key).Add(Val);
1683  } else {
1684  TIntV NewGroup;
1685  NewGroup.Add(Val);
1686  Grouping.AddDat(Key, NewGroup);
1687  }
1688 }
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
bool IsKey(const TKey &Key) const
Definition: hash.h:258
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
template<class T >
void TTable::UpdateGrouping ( THashMP< T, TIntV > &  Grouping,
Key,
TInt  Val 
) const
protected

Template for utility function to update a parallel grouping hash map.

Definition at line 1692 of file table.h.

1692  {
1693  if (Grouping.IsKey(Key)) {
1694  //printf("y\n");
1695  Grouping.GetDat(Key).Add(Val);
1696  } else {
1697  //printf("n\n");
1698  TIntV NewGroup;
1699  NewGroup.Add(Val);
1700  Grouping.AddDat(Key, NewGroup);
1701  }
1702 }
bool IsKey(const TKey &Key) const
Definition: hashmp.h:191
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
TDat & AddDat(const TKey &Key)
Definition: hashmp.h:181
const TDat & GetDat(const TKey &Key) const
Definition: hashmp.h:195
void TTable::UpdateTableForNewRow ( )
protected

Updates table state after adding one or more rows.

Definition at line 4140 of file table.cpp.

4140  {
4141  if (LastValidRow >= 0) {
4143  }
4144  Next.Add(Last);
4146 
4147  NumRows++;
4148  NumValidRows++;
4149 }
static const TInt Last
Special value for Next vector entry - last row in table.
Definition: table.h:486
TInt LastValidRow
Physical index of last valid row.
Definition: table.h:554
TIntV Next
A vector describing the logical order of the rows.
Definition: table.h:555
TInt NumRows
Number of rows in the table (valid and invalid).
Definition: table.h:551
TInt NumValidRows
Number of valid rows in the table (i.e. rows that were not logically removed).
Definition: table.h:552
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602

Friends And Related Function Documentation

friend class TPt< TTable >
friend

Definition at line 1525 of file table.h.

friend class TRowIterator
friend

Definition at line 1526 of file table.h.

friend class TRowIteratorWithRemove
friend

Definition at line 1527 of file table.h.

int TSnap::LoadCrossNet ( TCrossNet Graph,
PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV EdgeAttrV 
)
friend
int TSnap::LoadMode ( TModeNet Graph,
PTable  Table,
const TStr NCol,
TStrV NodeAttrV 
)
friend
template<class PGraph >
PGraph TSnap::ToGraph ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToGraphMP ( PTable  Table,
const TStr SrcCol,
const TStr DstCol 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToGraphMP3 ( PTable  Table,
const TStr SrcCol,
const TStr DstCol 
)
friend
template<class PGraph >
PGraph TSnap::ToNetwork ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV SrcAttrs,
TStrV DstAttrs,
TStrV EdgeAttrs,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraph >
PGraph TSnap::ToNetwork ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraph >
PGraph TSnap::ToNetwork ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV EdgeAttrV,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraph >
PGraph TSnap::ToNetwork ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV EdgeAttrV,
PTable  NodeTable,
const TStr NodeCol,
TStrV NodeAttrV,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV SrcAttrs,
TStrV DstAttrs,
TStrV EdgeAttrs,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV EdgeAttrV,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV EdgeAttrV,
PTable  NodeTable,
const TStr NodeCol,
TStrV NodeAttrV,
TAttrAggr  AggrPolicy 
)
friend
template<class PGraphMP >
PGraphMP TSnap::ToNetworkMP2 ( PTable  Table,
const TStr SrcCol,
const TStr DstCol,
TStrV SrcAttrs,
TStrV DstAttrs,
TStrV EdgeAttrs,
TAttrAggr  AggrPolicy 
)
friend

Member Data Documentation

TAttrAggr TTable::AggrPolicy
protected

Aggregation policy used for solving conflicts between different values of an attribute of the same node.

Definition at line 601 of file table.h.

THash<TStr,TPair<TAttrType,TInt> > TTable::ColTypeMap
protected

String columns are implemented using a string pool to fight memory fragmentation. The value of string column c in row r is Context.StringVals.GetKey(StrColMaps[c][r])

Definition at line 564 of file table.h.

TStrTrV TTable::CommonNodeAttrs
protected

List of attribute pairs with values common to source and destination and their common given name.

Definition at line 594 of file table.h.

TTableContext* TTable::Context
protected

Execution Context.

Definition at line 545 of file table.h.

TCRef TTable::CRef
protected

Definition at line 550 of file table.h.

TInt TTable::CurrBucket
protected

Current row id bucket - used when generating a sequence of graphs using an iterator.

Definition at line 600 of file table.h.

TStr TTable::DstCol
protected

Column (attribute) to serve as dst nodes when constructing the graph.

Definition at line 590 of file table.h.

TStrV TTable::DstNodeAttrV
protected

List of columns (attributes) to serve as destination node attributes.

Definition at line 593 of file table.h.

TStrV TTable::EdgeAttrV
protected

List of columns (attributes) to serve as edge attributes.

Definition at line 591 of file table.h.

TInt TTable::FirstValidRow
protected

Physical index of first valid row.

Definition at line 553 of file table.h.

THash<TStr, THash<TFlt, TIntV> > TTable::FltColIndexes
protected

Indexes for Float Columns.

Definition at line 570 of file table.h.

TVec<TFltV> TTable::FltCols
protected

Data columns of floating point attributes.

Definition at line 559 of file table.h.

THash<GroupStmt, THash<TInt, TGroupKey> > TTable::GroupIDMapping
protected

Maps grouping statements to their (group id –> group-by key) mapping.

A mapping between the newly-added group id column name of a grouping statement to a vector of the group-by attribute names and a flag specifying whether those attributes are ordered or not.

Definition at line 577 of file table.h.

THash<GroupStmt, THash<TGroupKey, TIntV> > TTable::GroupMapping
protected

Maps grouping statements to their (group-by key –> group id) mapping.

A mapping between grouping statement (group-by attribute names and 'Ordered' flag) to a hash map between given group ids to their corresponding group-by key.

Definition at line 581 of file table.h.

THash<TStr, GroupStmt > TTable::GroupStmtNames
protected

Maps user-given grouping statement names to their group-by attributes.

Definition at line 573 of file table.h.

TStr TTable::IdColName
protected

A mapping from column name to column type and column index among columns of the same type.

Name of column associated with (optional) permanent row identifiers.

Definition at line 565 of file table.h.

THash<TStr, THash<TInt, TIntV> > TTable::IntColIndexes
protected

Indexes for Int Columns.

Definition at line 568 of file table.h.

TVec<TIntV> TTable::IntCols
protected

Next[i] is the successor of row i. Table iterators follow the order dictated by Next

Data columns of integer attributes.

Definition at line 558 of file table.h.

TInt const TTable::Invalid = -2
staticprotected

Special value for Next vector entry - logically removed row.

Definition at line 487 of file table.h.

TInt TTable::IsNextDirty
protected

Flag to signify whether the rows are stored in logical sequence or reordered. Used for optimizing GetPartitionRanges.

Definition at line 603 of file table.h.

TInt const TTable::Last = -1
staticprotected

Special value for Next vector entry - last row in table.

Definition at line 486 of file table.h.

TInt TTable::LastValidRow
protected

Physical index of last valid row.

Definition at line 554 of file table.h.

TIntV TTable::Next
protected

A vector describing the logical order of the rows.

Definition at line 555 of file table.h.

TInt TTable::NumRows
protected

Number of rows in the table (valid and invalid).

Definition at line 551 of file table.h.

TInt TTable::NumValidRows
protected

Number of valid rows in the table (i.e. rows that were not logically removed).

Definition at line 552 of file table.h.

TVec<TIntV> TTable::RowIdBuckets
protected

Partitioning of row ids into buckets corresponding to different graph objects when generating a sequence of graphs.

Example: <T_1.age,T_2.age, age> - T_1.age is a src node attribute, T_2.age is a dst node attribute. However, since all nodes refer to the same universe of entities (users) we just do one assignment of age per node, and call that attribute 'age'. This list should be very small.

Definition at line 599 of file table.h.

TIntIntH TTable::RowIdMap
protected

Mapping of permanent row ids to physical id.

Definition at line 566 of file table.h.

Schema TTable::Sch
protected

Table Schema.

Execution context includes a global string pool for all string values of tables in current session. Access to the pool is done via Context.StringVals.

Definition at line 549 of file table.h.

TStr TTable::SrcCol
protected

Column (attribute) to serve as src nodes when constructing the graph.

Definition at line 589 of file table.h.

TStrV TTable::SrcNodeAttrV
protected

List of columns (attributes) to serve as source node attributes.

Definition at line 592 of file table.h.

TVec<TIntV> TTable::StrColMaps
protected

Data columns of integer mappings of string attributes.

Definition at line 560 of file table.h.

THash<TStr, THash<TInt, TIntV> > TTable::StrMapColIndexes
protected

Indexes for String Columns.

Definition at line 569 of file table.h.

TInt TTable::UseMP = 1
staticprotected

Global switch for choosing multi-threaded versions of TTable functions.

Definition at line 489 of file table.h.


The documentation for this class was generated from the following files: