Class CompressionSettings
- java.lang.Object
-
- org.apache.sysds.runtime.compress.CompressionSettings
-
public class CompressionSettings extends Object
Compression Settings class, used as a bundle of parameters inside the Compression framework. See CompressionSettingsBuilder for default non static parameters.
-
-
Field Summary
Fields Modifier and Type Field Description boolean
allowSharedDictionary
Share DDC Dictionaries between ColGroups.static int
BITMAP_BLOCK_SZ
Size of the blocks used in a blocked bitmap representation.double
coCodePercentage
A Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.CoCoderFactory.PartitionerType
columnPartitioner
The selected method for column partitioning used in CoCoding compressed columnsCostEstimatorFactory.CostType
costComputationType
The cost computation type for the compressionSampleEstimatorFactory.EstimationType
estimationType
The sample type used for samplingboolean
isInSparkInstruction
Is a spark instructionboolean
lossy
True if lossy compression is enabledint
maxColGroupCoCode
The maximum number of columns CoCoded allowedint
maxSampleSize
The maximum size of the sample extracted.double
minimumCompressionRatio
The minimum compression ratio to achieve.int
minimumSampleSize
The minimum size of the sample extracted.static int
PAR_DDC_THRESHOLD
Parallelization threshold for DDC compressiondouble
samplePower
The sampling ratio power to use when choosing sample size.double
samplingRatio
The sampling ratio used when choosing ColGroups.InsertionSorterFactory.SORT_TYPE
sdcSortType
The sorting type used in sorting/joining offsets to create SDC groupsint
seed
If the seed is -1 then the system used system millisecond time and class hash for seeding.boolean
sortTuplesByFrequency
Sorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)boolean
transposed
Transpose input matrix, to optimize access when extracting bitmaps.String
transposeInput
Boolean specifying which transpose setting is used, can be auto, true or falseEnumSet<AColGroup.CompressionType>
validCompressions
Valid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
-
-
-
Field Detail
-
PAR_DDC_THRESHOLD
public static int PAR_DDC_THRESHOLD
Parallelization threshold for DDC compression
-
BITMAP_BLOCK_SZ
public static final int BITMAP_BLOCK_SZ
Size of the blocks used in a blocked bitmap representation. Note it is exactly Character.MAX_VALUE. This is not Character max value + 1 because it breaks the offsets in cases with fully dense values.- See Also:
- Constant Field Values
-
sortTuplesByFrequency
public final boolean sortTuplesByFrequency
Sorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)
-
samplingRatio
public final double samplingRatio
The sampling ratio used when choosing ColGroups. Note that, default behavior is to use exact estimator if the number of elements is below 1000. DEPRECATED
-
samplePower
public final double samplePower
The sampling ratio power to use when choosing sample size. This is used in accordance to the function: sampleSize += nRows^samplePower; The value is bounded to be in the range of 0 to 1, 1 giving a sample size of everything, and 0 adding 1.
-
allowSharedDictionary
public final boolean allowSharedDictionary
Share DDC Dictionaries between ColGroups.
-
transposeInput
public final String transposeInput
Boolean specifying which transpose setting is used, can be auto, true or false
-
seed
public final int seed
If the seed is -1 then the system used system millisecond time and class hash for seeding.
-
lossy
public final boolean lossy
True if lossy compression is enabled
-
columnPartitioner
public final CoCoderFactory.PartitionerType columnPartitioner
The selected method for column partitioning used in CoCoding compressed columns
-
costComputationType
public final CostEstimatorFactory.CostType costComputationType
The cost computation type for the compression
-
maxColGroupCoCode
public final int maxColGroupCoCode
The maximum number of columns CoCoded allowed
-
coCodePercentage
public final double coCodePercentage
A Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.
-
validCompressions
public final EnumSet<AColGroup.CompressionType> validCompressions
Valid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
-
minimumSampleSize
public final int minimumSampleSize
The minimum size of the sample extracted.
-
maxSampleSize
public final int maxSampleSize
The maximum size of the sample extracted.
-
estimationType
public final SampleEstimatorFactory.EstimationType estimationType
The sample type used for sampling
-
transposed
public boolean transposed
Transpose input matrix, to optimize access when extracting bitmaps. This setting is changed inside the script based on the transposeInput setting. This is intentionally left as a mutable value, since the transposition of the input matrix is decided in phase 3.
-
minimumCompressionRatio
public final double minimumCompressionRatio
The minimum compression ratio to achieve.
-
isInSparkInstruction
public final boolean isInSparkInstruction
Is a spark instruction
-
sdcSortType
public final InsertionSorterFactory.SORT_TYPE sdcSortType
The sorting type used in sorting/joining offsets to create SDC groups
-
-