Class ColumnEncoderBagOfWords
- java.lang.Object
-
- org.apache.sysds.runtime.transform.encode.ColumnEncoder
-
- org.apache.sysds.runtime.transform.encode.ColumnEncoderBagOfWords
-
- All Implemented Interfaces:
Externalizable,Serializable,Comparable<ColumnEncoder>,Encoder
public class ColumnEncoderBagOfWords extends ColumnEncoder
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
ColumnEncoder.EncoderType
-
-
Field Summary
Fields Modifier and Type Field Description static intNUM_SAMPLES_MAP_ESTIMATION-
Fields inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
APPLY_ROW_BLOCKS_PER_COLUMN, BUILD_ROW_BLOCKS_PER_COLUMN
-
-
Constructor Summary
Constructors Constructor Description ColumnEncoderBagOfWords()ColumnEncoderBagOfWords(ColumnEncoderBagOfWords enc)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidallocateMetaData(FrameBlock meta)Pre-allocate a FrameBlock for metadata collection.voidbuild(CacheBlock<?> in)Build the transform meta data for the given block input.voidbuildPartial(FrameBlock in)Partial build of internal data structures (e.g., in distributed spark operations).voidcomputeMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)doublecomputeNnzEstimate(CacheBlock<?> in, int[] sampleIndices)voidcomputeNnzPerRow(CacheBlock<?> in, int start, int end)Callable<Object>getBuildTask(CacheBlock<?> in)intgetDomainSize()FrameBlockgetMetaData(FrameBlock out)Construct a frame block out of the transform meta data.Callable<Object>getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret, int pos)Callable<Object>getPartialMergeBuildTask(HashMap<Integer,?> ret)HashSet<Object>getPartialTokenDictionary()Map<Object,Integer>getTokenDictionary()voidinitMetaData(FrameBlock meta)Sets up the required meta data for a subsequent call to apply.voidprepareBuildPartial()Allocates internal data structures for partial build.voidreadExternal(ObjectInput in)Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.voidsetTokenDictionary(HashMap<Object,Integer> dict)static String[]tokenize(String current, boolean caseSensitive, String seperatorRegex)voidwriteExternal(ObjectOutput out)Redirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.-
Methods inherited from class org.apache.sysds.runtime.transform.encode.ColumnEncoder
apply, apply, build, build, compareTo, getApplyTasks, getBuildTasks, getColID, getColMapping, getEstMetaSize, getEstNumDistincts, initEmbeddings, isApplicable, isApplicable, mergeAt, setColID, setEstMetaSize, setEstNumDistincts, shiftCol, updateIndexRanges
-
-
-
-
Constructor Detail
-
ColumnEncoderBagOfWords
public ColumnEncoderBagOfWords()
-
ColumnEncoderBagOfWords
public ColumnEncoderBagOfWords(ColumnEncoderBagOfWords enc)
-
-
Method Detail
-
computeNnzEstimate
public double computeNnzEstimate(CacheBlock<?> in, int[] sampleIndices)
-
computeMapSizeEstimate
public void computeMapSizeEstimate(CacheBlock<?> in, int[] sampleIndices)
- Overrides:
computeMapSizeEstimatein classColumnEncoder
-
computeNnzPerRow
public void computeNnzPerRow(CacheBlock<?> in, int start, int end)
-
tokenize
public static String[] tokenize(String current, boolean caseSensitive, String seperatorRegex)
-
getDomainSize
public int getDomainSize()
- Overrides:
getDomainSizein classColumnEncoder
-
getBuildTask
public Callable<Object> getBuildTask(CacheBlock<?> in)
- Overrides:
getBuildTaskin classColumnEncoder
-
build
public void build(CacheBlock<?> in)
Description copied from interface:EncoderBuild the transform meta data for the given block input. This call modifies and keeps meta data as encoder state.- Parameters:
in- input frame block
-
getPartialBuildTask
public Callable<Object> getPartialBuildTask(CacheBlock<?> in, int startRow, int blockSize, HashMap<Integer,Object> ret, int pos)
- Overrides:
getPartialBuildTaskin classColumnEncoder
-
getPartialMergeBuildTask
public Callable<Object> getPartialMergeBuildTask(HashMap<Integer,?> ret)
- Overrides:
getPartialMergeBuildTaskin classColumnEncoder
-
prepareBuildPartial
public void prepareBuildPartial()
Description copied from class:ColumnEncoderAllocates internal data structures for partial build.- Specified by:
prepareBuildPartialin interfaceEncoder- Overrides:
prepareBuildPartialin classColumnEncoder
-
buildPartial
public void buildPartial(FrameBlock in)
Description copied from class:ColumnEncoderPartial build of internal data structures (e.g., in distributed spark operations).- Specified by:
buildPartialin interfaceEncoder- Overrides:
buildPartialin classColumnEncoder- Parameters:
in- input frame block
-
allocateMetaData
public void allocateMetaData(FrameBlock meta)
Description copied from interface:EncoderPre-allocate a FrameBlock for metadata collection.- Parameters:
meta- frame block
-
getMetaData
public FrameBlock getMetaData(FrameBlock out)
Description copied from interface:EncoderConstruct a frame block out of the transform meta data.- Parameters:
out- output frame block- Returns:
- output frame block?
-
initMetaData
public void initMetaData(FrameBlock meta)
Description copied from interface:EncoderSets up the required meta data for a subsequent call to apply.- Parameters:
meta- frame block
-
writeExternal
public void writeExternal(ObjectOutput out) throws IOException
Description copied from class:ColumnEncoderRedirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd serialization.- Specified by:
writeExternalin interfaceExternalizable- Overrides:
writeExternalin classColumnEncoder- Parameters:
out- object output- Throws:
IOException- if IOException occurs
-
readExternal
public void readExternal(ObjectInput in) throws IOException
Description copied from class:ColumnEncoderRedirects the default java serialization via externalizable to our default hadoop writable serialization for efficient broadcast/rdd deserialization.- Specified by:
readExternalin interfaceExternalizable- Overrides:
readExternalin classColumnEncoder- Parameters:
in- object input- Throws:
IOException- if IOException occur
-
-