org.canova.image.recordreader

Class MNISTRecordReader

  • All Implemented Interfaces:
    Closeable, Serializable, AutoCloseable, org.canova.api.conf.Configurable, org.canova.api.records.reader.RecordReader


    public class MNISTRecordReaderextends Objectimplements org.canova.api.records.reader.RecordReader
    Record reader that understands the MNIST file format as described here: http://yann.lecun.com/exdb/mnist/ Not built to handle splits of the file, for now forces a single worker to process file Why? - the MNIST training file is 47MB unzipped - right now (June 2015) Canova only runs in local/serial mode - when we add MapReduce as a runtime engine, the training file size (47MB) is still smaller than the lowest production block size in HDFS these days (64MB, 128MB), so normally MapReduce's scheduling system would not split the file (unless you manually set the file's block size lower) - This input format's main purpose is to read MNIST raw data into Canova to be written out as another format (SVMLight most likely) for DL4J's input format's to read Assumes that file exists locally and has been unzipped Why? - When we do port this input format to be HDFS-aware, these mechanics will be incompatible (we dont want N workers all trying to download files or coordinate who is downloading the file)
    Author:
    Josh Patterson
    See Also:
    Serialized Form
    • Field Detail

      • curr

        protected org.nd4j.linalg.dataset.DataSet curr
    • Method Detail

      • next

        public Collection<org.canova.api.writable.Writable> next()
        Basic logic: 1. hit fetch() and have the MNISTManager class go pull a batch from the file (converting them over to matrices) 2. loop through the DataSet(s) and convert them into a list of Writables Criticism of current design - makes two passes over result set for conversion Reasoning for current design - want to re-use older existing code, and need to get this out the door - dataset is bounded to 60k records at most thus input is always constant - this input format never gets used beyond converting a demo dataset that's bounded LABEL - added as last value
        Specified by:
        next in interface org.canova.api.records.reader.RecordReader
      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface org.canova.api.records.reader.RecordReader
      • setConf

        public void setConf(org.canova.api.conf.Configuration conf)
        Specified by:
        setConf in interface org.canova.api.conf.Configurable
      • getConf

        public org.canova.api.conf.Configuration getConf()
        Specified by:
        getConf in interface org.canova.api.conf.Configurable
      • fetchNext

        public boolean fetchNext()
        Based on logic from fetcher: https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-core/src/main/java/org/deeplearning4j/datasets/fetchers/MnistDataFetcher.java The cursor logic here is broken. - we need to make sure we can get one example and advance the cursor
        Parameters:
        numExamples -
      • createOutputVector

        protected org.nd4j.linalg.api.ndarray.INDArray createOutputVector(int outcomeLabel)
        Creates an output label matrix
        Parameters:
        outcomeLabel - the outcome label to use
        Returns:
        a binary vector where 1 is applyTransformToDestination to the index specified by outcomeLabel
      • createInputMatrix

        protected org.nd4j.linalg.api.ndarray.INDArray createInputMatrix(int numRows)
        Creates a feature vector
        Parameters:
        numRows - the number of examples
        Returns:
        a feature vector
      • createOutputMatrix

        protected org.nd4j.linalg.api.ndarray.INDArray createOutputMatrix(int numRows)
      • initializeCurrFromList

        protected void initializeCurrFromList(List<org.nd4j.linalg.dataset.DataSet> examples)
        Initializes this data transform fetcher from the passed in datasets
        Parameters:
        examples - the examples to use
      • getLabels

        public List<String> getLabels()
        Specified by:
        getLabels in interface org.canova.api.records.reader.RecordReader
      • reset

        public void reset()
        Specified by:
        reset in interface org.canova.api.records.reader.RecordReader

Copyright © 2015. All rights reserved.



NOTHING
NOTHING
Add the Maven Dependecy to your project: maven dependecy for com.amazonaws : aws-java-sdk : 1.3.14